word embedding code, trying to implement utf-8 checker

Question

I’m having trouble getting my code to run properly. I tried to implement a utf-8 checker but it’s causing problems for other parts of the code.

this is the code:

pd = codecs.open("r8-train-all-terms.txt", mode="r", encoding="utf-8")
# pd = open("r8-test-all-terms.txt", errors="strict", encoding="utf-8")

train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t')
test = pd.read_csv('r8-test-all-terms.txt', header=None, sep='t')

this is the error im getting:

File "C:UserssmustPycharmProjectspythonProject1main.py", line 21, in <module>
    train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t')
File "C:UserssmustAppDataLocalProgramsPythonPython39libcodecs.py", line 743, in __getattr__
    return getattr(self.stream, name)
AttributeError: '_io.BufferedReader' object has no attribute 'read_csv'

Asked By: Shahd Mustafa

||

Source

Answer 1

I’m guessing that your local variable pd is overriding the pandas import that you really need. Rename your open file handles to something else:

import pandas as pd
fh = codecs.open("r8-train-all-terms.txt", mode="r", encoding="utf-8")
fh2 = open("r8-test-all-terms.txt", errors="strict", encoding="utf-8")

train = pd.read_csv(fh, header=None, sep='t')
test = pd.read_csv(fh2, header=None, sep='t')

Pandas read_csv() can take a filename as well, and some encoding parameters, so this might be a second way to do the same thing:

import pandas as pd

train = pd.read_csv("r8-train-all-terms.txt", header=None, sep='t', encoding = 'utf-8')
test = pd.read_csv('r8-test-all-terms.txt', header=None, sep='t', encoding = 'utf-8', encoding_errors = 'strict')

Answered By: DraftyHat

word embedding code, trying to implement utf-8 checker

Question:

Answers: