Pandas: create named columns in DataFrame from dict
Question:
I have a dictionary object of the form:
my_dict = {id1: val1, id2: val2, id3: val3, ...}
I want to create this into a DataFrame where I want to name the 2 columns ‘business_id’ and ‘business_code’.
I tried:
business_df = DataFrame.from_dict(my_dict,orient='index',columns=['business_id','business_code'])
But it says from_dict
doesn’t take in a columns argument.
TypeError: from_dict() got an unexpected keyword argument ‘columns’
Answers:
You can iterate through the items:
In [11]: pd.DataFrame(list(my_dict.items()),
columns=['business_id','business_code'])
Out[11]:
business_id business_code
0 id2 val2
1 id3 val3
2 id1 val1
Do this:
create the dataframe
df = pd.DataFrame(data_as_2d_ndarray)
create a sorted list of column names from the dictionary – adjust the key karg as need to grab the sorting value from your dict, obvilous the dictionary the data must have consistent shapes
col_names = sorted(list(col_dict.iteritems()),key=lambda x:x[0])
reshape and set the column names
df.columns = zip(*col_names)[1]
To get the same functionality as the documentation and avoid using code workarounds, make sure you’re using the most recent version of Pandas. I recently encountered the same error when running a line of code from the Pandas tutorial:
pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]),orient='index', columns=['one', 'two', 'three'])
I checked the version of Pandas and found I was running version 22, when version 23 is available.
import pandas as pd
pd.__version__
Out[600]: '0.22.0'
I upgraded using pip:
c:pip install --upgrade pandas
I confirmed my version updated to 23, and the same from_dict() code worked without error. No code modifications required.
This is with respect to TypeError you faced. As per Pandas documentation, from_dict will take the keyword ‘columns’ only if the orient = ‘index’.
From version 0.23.0, you can specify a columns
parameter in from_dict
:
my_dict = {id1: val1, id2: val2, id3: val3, ...}
prepared_dict = {i: x for i, x in enumerate(my_dict.items())}
df = pd.DataFrame.from_dict(prepared_dict, orient='index', columns=['business_id', 'business_code'])
Note: I also answered in kind on this similar question.
I have a dictionary object of the form:
my_dict = {id1: val1, id2: val2, id3: val3, ...}
I want to create this into a DataFrame where I want to name the 2 columns ‘business_id’ and ‘business_code’.
I tried:
business_df = DataFrame.from_dict(my_dict,orient='index',columns=['business_id','business_code'])
But it says from_dict
doesn’t take in a columns argument.
TypeError: from_dict() got an unexpected keyword argument ‘columns’
You can iterate through the items:
In [11]: pd.DataFrame(list(my_dict.items()),
columns=['business_id','business_code'])
Out[11]:
business_id business_code
0 id2 val2
1 id3 val3
2 id1 val1
Do this:
create the dataframe
df = pd.DataFrame(data_as_2d_ndarray)
create a sorted list of column names from the dictionary – adjust the key karg as need to grab the sorting value from your dict, obvilous the dictionary the data must have consistent shapes
col_names = sorted(list(col_dict.iteritems()),key=lambda x:x[0])
reshape and set the column names
df.columns = zip(*col_names)[1]
To get the same functionality as the documentation and avoid using code workarounds, make sure you’re using the most recent version of Pandas. I recently encountered the same error when running a line of code from the Pandas tutorial:
pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]),orient='index', columns=['one', 'two', 'three'])
I checked the version of Pandas and found I was running version 22, when version 23 is available.
import pandas as pd
pd.__version__
Out[600]: '0.22.0'
I upgraded using pip:
c:pip install --upgrade pandas
I confirmed my version updated to 23, and the same from_dict() code worked without error. No code modifications required.
This is with respect to TypeError you faced. As per Pandas documentation, from_dict will take the keyword ‘columns’ only if the orient = ‘index’.
From version 0.23.0, you can specify a columns
parameter in from_dict
:
my_dict = {id1: val1, id2: val2, id3: val3, ...}
prepared_dict = {i: x for i, x in enumerate(my_dict.items())}
df = pd.DataFrame.from_dict(prepared_dict, orient='index', columns=['business_id', 'business_code'])
Note: I also answered in kind on this similar question.