Reading pandas dataframe that contains dictionaries in cells from csv
Question:
I saved a pandas dataframe that looks like the following as a csv file.
a
0 {'word': 5.7}
1 {'khfds': 8.34}
When I attempt to read the dataframe as shown below, I receive the following error.
df = pd.read_csv('foo.csv', index_col=0, dtype={'str': 'dict'})
TypeError: data type "dict" not understood
The heart of my question is how do I read the csv file to recover the dataframe in the same form as when it was created. I also have tried reading without the dtype={} as well as replacing ‘dict’ with alternatives such as ‘dictionary’, ‘object’, and ‘str’.
Answers:
CSV files may only contain text, so dictionaries are out of scope. Therefore, you need to read the text literally to convert to dict
. One way is using ast.literal_eval
:
import pandas as pd
from ast import literal_eval
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr)
df['a'] = df['a'].apply(literal_eval)
print(df['a'].apply(lambda x: type(x)))
0 <class 'dict'>
1 <class 'dict'>
Name: a, dtype: object
However, I strongly recommend you do not use Pandas specifically to store pointers to dictionaries. Pandas works best with contiguous memory blocks, e.g. separate numeric data into numeric series.
You may also use the plain and simple python eval as follows:
import pandas as pd
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr)
df['a'] = df['a'].apply(eval)
print(df['a'].apply(lambda x: type(x)))
0 <class 'dict'>
1 <class 'dict'>
Name: a, dtype: object
You can also do the conversion to dictionary directly while reading the csv files as follows:
import pandas as pd
from ast import literal_eval
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr, converters={'a': literal_eval})
print(df.iloc[0]['a']['word'])
(I don’t have enough reputation to comment)
Even after giving ast.literal_eval I had the "ValueError: malformed node or string" on some dict columns.
Fixing the spacing in dict, fixed the issue for me.
example –
before
ast.literal_eval("{'word' : 5.7}, {'khfds' : 8.34}")
after
ast.literal_eval("{'word': 5.7}, {'khfds': 8.34}")
hope this helps someone
I saved a pandas dataframe that looks like the following as a csv file.
a
0 {'word': 5.7}
1 {'khfds': 8.34}
When I attempt to read the dataframe as shown below, I receive the following error.
df = pd.read_csv('foo.csv', index_col=0, dtype={'str': 'dict'})
TypeError: data type "dict" not understood
The heart of my question is how do I read the csv file to recover the dataframe in the same form as when it was created. I also have tried reading without the dtype={} as well as replacing ‘dict’ with alternatives such as ‘dictionary’, ‘object’, and ‘str’.
CSV files may only contain text, so dictionaries are out of scope. Therefore, you need to read the text literally to convert to dict
. One way is using ast.literal_eval
:
import pandas as pd
from ast import literal_eval
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr)
df['a'] = df['a'].apply(literal_eval)
print(df['a'].apply(lambda x: type(x)))
0 <class 'dict'>
1 <class 'dict'>
Name: a, dtype: object
However, I strongly recommend you do not use Pandas specifically to store pointers to dictionaries. Pandas works best with contiguous memory blocks, e.g. separate numeric data into numeric series.
You may also use the plain and simple python eval as follows:
import pandas as pd
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr)
df['a'] = df['a'].apply(eval)
print(df['a'].apply(lambda x: type(x)))
0 <class 'dict'>
1 <class 'dict'>
Name: a, dtype: object
You can also do the conversion to dictionary directly while reading the csv files as follows:
import pandas as pd
from ast import literal_eval
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr, converters={'a': literal_eval})
print(df.iloc[0]['a']['word'])
(I don’t have enough reputation to comment)
Even after giving ast.literal_eval I had the "ValueError: malformed node or string" on some dict columns.
Fixing the spacing in dict, fixed the issue for me.
example –
before
ast.literal_eval("{'word' : 5.7}, {'khfds' : 8.34}")
after
ast.literal_eval("{'word': 5.7}, {'khfds': 8.34}")
hope this helps someone