Reading pandas dataframe that contains dictionaries in cells from csv

Question:

I saved a pandas dataframe that looks like the following as a csv file.

    a
0 {'word': 5.7}
1 {'khfds': 8.34}

When I attempt to read the dataframe as shown below, I receive the following error.

df = pd.read_csv('foo.csv', index_col=0, dtype={'str': 'dict'})

TypeError: data type "dict" not understood

The heart of my question is how do I read the csv file to recover the dataframe in the same form as when it was created. I also have tried reading without the dtype={} as well as replacing ‘dict’ with alternatives such as ‘dictionary’, ‘object’, and ‘str’.

Asked By: TommyTorty10

||

Answers:

CSV files may only contain text, so dictionaries are out of scope. Therefore, you need to read the text literally to convert to dict. One way is using ast.literal_eval:

import pandas as pd
from ast import literal_eval
from io import StringIO

mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")

df = pd.read_csv(mystr)

df['a'] = df['a'].apply(literal_eval)

print(df['a'].apply(lambda x: type(x)))

0    <class 'dict'>
1    <class 'dict'>
Name: a, dtype: object

However, I strongly recommend you do not use Pandas specifically to store pointers to dictionaries. Pandas works best with contiguous memory blocks, e.g. separate numeric data into numeric series.

Answered By: jpp

You may also use the plain and simple python eval as follows:

import pandas as pd
from io import StringIO

mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")

df = pd.read_csv(mystr)

df['a'] = df['a'].apply(eval)

print(df['a'].apply(lambda x: type(x)))

0    <class 'dict'>
1    <class 'dict'>
Name: a, dtype: object
Answered By: harshlal028

You can also do the conversion to dictionary directly while reading the csv files as follows:

import pandas as pd
from ast import literal_eval
from io import StringIO

mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")

df = pd.read_csv(mystr, converters={'a': literal_eval})

print(df.iloc[0]['a']['word'])
Answered By: RomaneG

(I don’t have enough reputation to comment)
Even after giving ast.literal_eval I had the "ValueError: malformed node or string" on some dict columns.

Fixing the spacing in dict, fixed the issue for me.
example –

before

ast.literal_eval("{'word' : 5.7}, {'khfds' : 8.34}")

after

ast.literal_eval("{'word': 5.7}, {'khfds': 8.34}")

hope this helps someone

Answered By: Abhijith M