word embedding code, trying to implement utf-8 checker
Question:
I’m having trouble getting my code to run properly. I tried to implement a utf-8 checker but it’s causing problems for other parts of the code.
this is the code:
pd = codecs.open("r8-train-all-terms.txt", mode="r", encoding="utf-8")
# pd = open("r8-test-all-terms.txt", errors="strict", encoding="utf-8")
train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t')
test = pd.read_csv('r8-test-all-terms.txt', header=None, sep='t')
this is the error im getting:
File "C:UserssmustPycharmProjectspythonProject1main.py", line 21, in <module>
train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t')
File "C:UserssmustAppDataLocalProgramsPythonPython39libcodecs.py", line 743, in __getattr__
return getattr(self.stream, name)
AttributeError: '_io.BufferedReader' object has no attribute 'read_csv'
Answers:
I’m guessing that your local variable pd
is overriding the pandas import that you really need. Rename your open file handles to something else:
import pandas as pd
fh = codecs.open("r8-train-all-terms.txt", mode="r", encoding="utf-8")
fh2 = open("r8-test-all-terms.txt", errors="strict", encoding="utf-8")
train = pd.read_csv(fh, header=None, sep='t')
test = pd.read_csv(fh2, header=None, sep='t')
Pandas read_csv()
can take a filename as well, and some encoding parameters, so this might be a second way to do the same thing:
import pandas as pd
train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t', encoding = 'utf-8')
test = pd.read_csv('r8-test-all-terms.txt', header=None, sep='t', encoding = 'utf-8', encoding_errors = 'strict')
I’m having trouble getting my code to run properly. I tried to implement a utf-8 checker but it’s causing problems for other parts of the code.
this is the code:
pd = codecs.open("r8-train-all-terms.txt", mode="r", encoding="utf-8")
# pd = open("r8-test-all-terms.txt", errors="strict", encoding="utf-8")
train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t')
test = pd.read_csv('r8-test-all-terms.txt', header=None, sep='t')
this is the error im getting:
File "C:UserssmustPycharmProjectspythonProject1main.py", line 21, in <module>
train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t')
File "C:UserssmustAppDataLocalProgramsPythonPython39libcodecs.py", line 743, in __getattr__
return getattr(self.stream, name)
AttributeError: '_io.BufferedReader' object has no attribute 'read_csv'
I’m guessing that your local variable pd
is overriding the pandas import that you really need. Rename your open file handles to something else:
import pandas as pd
fh = codecs.open("r8-train-all-terms.txt", mode="r", encoding="utf-8")
fh2 = open("r8-test-all-terms.txt", errors="strict", encoding="utf-8")
train = pd.read_csv(fh, header=None, sep='t')
test = pd.read_csv(fh2, header=None, sep='t')
Pandas read_csv()
can take a filename as well, and some encoding parameters, so this might be a second way to do the same thing:
import pandas as pd
train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t', encoding = 'utf-8')
test = pd.read_csv('r8-test-all-terms.txt', header=None, sep='t', encoding = 'utf-8', encoding_errors = 'strict')