Pandas copy column names from one dataframe to another
Question:
Let’s say that we have two pandas data frames. The first one hasn’t got column names:
no_col_names_df = pd.DataFrame(np.array([[1,2,3], [4,5,6], [7,8,9]]))
The second has:
col_names_df = pd.DataFrame(np.array([[10,2,3], [4,45,6], [7,18,9]]),
columns=['col1', 'col2', 'col3'])
What I want to do is to get copy column names from the col_names_df
to no_col_names_df
so that the following data frame is created:
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
I’ve tried the following:
new_df_with_col_names = pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns)
but instead of values from the no_col_names_df
I get NaN
s.
Answers:
The simplest way is to directly assign the columns of col_names_df
to the ones of no_col_names_df
:
no_col_names_df.columns = col_names_df.columns
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
If you’re getting nan then most likely the issue is the data parameter, try this:
new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)
output:
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
Just like you have used columns from the dataframe with column names, you can use values from the dataframe without column names:
new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)
In [4]: new_df_with_col_names = pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns)
In [5]: new_df_with_col_names
Out[5]:
col1 col2 col3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
In [6]: new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)
In [7]: new_df_with_col_names
Out[7]:
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
This:
pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns)
gives you all ‘NaN’ dataframe because you pass a dataframe to construct a new dataframe and assign new columns
to it. Pandas essentially constructs identical dataframe and does reindex
along axis 1
on it. In other words, that command is equivalent to doing:
no_col_names_df.reindex(col_names_df.columns, axis=1)
You need either change directly no_col_names_df.columns
or passing no_col_names_df.values
I have tried the simplest one and it is worked for me;
no_col_names_df.columns = col_names_df.columns
Let’s say that we have two pandas data frames. The first one hasn’t got column names:
no_col_names_df = pd.DataFrame(np.array([[1,2,3], [4,5,6], [7,8,9]]))
The second has:
col_names_df = pd.DataFrame(np.array([[10,2,3], [4,45,6], [7,18,9]]),
columns=['col1', 'col2', 'col3'])
What I want to do is to get copy column names from the col_names_df
to no_col_names_df
so that the following data frame is created:
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
I’ve tried the following:
new_df_with_col_names = pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns)
but instead of values from the no_col_names_df
I get NaN
s.
The simplest way is to directly assign the columns of col_names_df
to the ones of no_col_names_df
:
no_col_names_df.columns = col_names_df.columns
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
If you’re getting nan then most likely the issue is the data parameter, try this:
new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)
output:
col1 col2 col3
0 1 2 3
1 4 5 6
2 7 8 9
Just like you have used columns from the dataframe with column names, you can use values from the dataframe without column names:
new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns)
In [4]: new_df_with_col_names = pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns) In [5]: new_df_with_col_names Out[5]: col1 col2 col3 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN In [6]: new_df_with_col_names = pd.DataFrame(data=no_col_names_df.values, columns=col_names_df.columns) In [7]: new_df_with_col_names Out[7]: col1 col2 col3 0 1 2 3 1 4 5 6 2 7 8 9
This:
pd.DataFrame(data=no_col_names_df, columns=col_names_df.columns)
gives you all ‘NaN’ dataframe because you pass a dataframe to construct a new dataframe and assign new columns
to it. Pandas essentially constructs identical dataframe and does reindex
along axis 1
on it. In other words, that command is equivalent to doing:
no_col_names_df.reindex(col_names_df.columns, axis=1)
You need either change directly no_col_names_df.columns
or passing no_col_names_df.values
I have tried the simplest one and it is worked for me;
no_col_names_df.columns = col_names_df.columns