Incorrectly Reading a Column Containing Lists in Pandas

Question:

I have a pandas data frame containing a column with a list that I am reading from a CSV. For example, the column in the CSV appears like so:

ColName2007
=============
['org1', 'org2']
['org2', 'org3']
...

So, when I read this column into Pandas, each entry of the columns is treated as a string, rather than a list of strings.

df['ColName2007'][0] returns "['org1', 'org2']". Notice this is being stored as a string, not a list of strings.

I want to be able to perform list operations on this data. What is a good way to quickly and efficiently convert this column of strings into a column of lists that contain strings?

Asked By: 324

||

Answers:

I would use a strip/split :

df['ColName2007']= df['ColName2007'].str.strip("[]").str.split(",")

Otherwise, you can apply an ast.literal_eval as suggested by @Bjay Regmi in the comments.

import ast

df["ColName2007"] = df["ColName2007"].apply(ast.literal_eval)
Answered By: abokey