How to convert a dictionary of list into a pandas dataframe?

Question:

new_data = {'mid':mids, 'human':all_tags, 'new':new_tags, 'old':old_tags}
df = pd.DataFrame(new_data.items(), columns=['mid', 'human', 'new', 'old'])

new_data is a dictionary, in which the value of each column is a list with equal length. I tried to convert it into a df, but it gives this error:

ValueError: 4 columns passed, passed data had 2 columns

How to convert this new_data into a df?

Asked By: marlon

||

Answers:

Remove .items():

new_data = {'mid':[1, 2], 'human':[1, 2], 'new':[1, 2], 'old':[1, 2]}
df = pd.DataFrame(new_data, columns=['mid', 'human', 'new', 'old'])

Note:

Passing columns here is redundant, because their names equal the dictionary keys anyways. So just use:

>>> pd.DataFrame(new_data)

   mid  human  new  old
0    1      1    1    1
1    2      2    2    2

The reason behind the error:

If you try this, here is what you’ll get:

>>> pd.DataFrame(new_data.items())

       0       1
0    mid  [1, 2]
1  human  [1, 2]
2    new  [1, 2]
3    old  [1, 2]

Why?

Check this:

>>> list(new_data.items())

[('mid', [1, 2]), ('human', [1, 2]), ('new', [1, 2]), ('old', [1, 2])]

It is in a format "list of lists" (well, list of tuples in this case). If pd.DataFrame() receives this, it will assume you are going row by row. This is why it constructs only two columns. And that is why your assignment of column names fails – there are 2 columns but you are providing 4 column names.

Answered By: Vladimir Fokow
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.