Pandas read_csv is retriving different data than what is in the text file

Question:

I have a .txt (notepad) file called Log1. It has the following saved in it: [1, 1, 1, 0]

When I write a program to retrieve the data:

Log1 = pd.read_csv('Path...\Log1.txt')
Log1 = list(Log1)
print(Log1)

It prints: [‘[1’, ‘ 1’, ‘ 1.1’, ‘ 0]’]

I dont understand where the ".1" is coming from on the third number. Its not in the text file, it just adds it.

Funny enough if I change the numbers in the text file to: [1, 0, 1, 1]. It does not add the .1 It prints [‘[1’, ‘ 0’, ‘ 1’, ‘ 1]’]

Very odd why its acting this way if anyone has an idea.

Asked By: Sam

||

Answers:

This should work. Can you please try this,

log2 = log1.values.tolist()

Output:

[[‘1’], [‘1’], [‘1’], [‘0’]]

Answered By: Rohit

Your data is not in a CSV format. In CSV you would rather have

1;1;0;1

or something similar.

If you have multiple lines like this, it might make sense to parse this as CSV, otherwise I’d rather parse it using a regexp and .split on the result.

Proposal: Add a bigger input example and your expected output.

Answered By: Michael Kopp

Well, I worked out some other options as well, just for the record:

Solution 1 (plain read – this one gets a list of string)

log4 = []
with open('log4.txt') as f:
    log4 = f.readlines()
print(log4)

Solution 2 (convert to list of ints)

import ast
with open('log4.txt', 'r') as f:
    inp = ast.literal_eval(f.read())
print(inp))

Solution 3 (old school string parsing – convert to list of ints, then put it in a dataframe)

with open('log4.txt', 'r') as f:
    mylist = f.read()

mylist = mylist.replace('[','').replace(']','').replace(' ','')
mylist = mylist.split(',')

df = pd.DataFrame({'Col1': mylist})
df['Col1'] = df['Col1'].astype(int)
print(df)

Other ideas here as well:

https://docs.python-guide.org/scenarios/serialization/

In general the reading from the text file (deserializing) is easier if the text file is written in a good structured format in the first place – csv file, pickle file, json file, etc. In this case, using the ast.literal_eval() worked well since this was written out as a list using it’s __repr__ format — though honestly I’ve never done that before so it was an interesting solution to me as well 🙂

Answered By: topsail
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.