Pandas: control new column names when merging two dataframes?


I would like to merge two Pandas dataframes together and control the names of the new column values.

I originally created the dataframes from CSV files. The original CSV files looked like this:

   # presents.csv
   12B,Liverpool Street,37,212,...
   # trees.csv
   12B,Liverpool Street,2,92,...

Now I have two data frames:

df_presents = pd.read_csv(StringIO(presents_txt))
df_trees = pd.read_csv(StringIO(trees_txt))

I want to merge them together to get a final data frame, joining on the org and name values, and then prefixing all other columns with an appropriate prefix.

12B,Liverpool Street,37,212,2,92,...

I’ve been reading the documentation on merging and joining. This seems to merge correctly and result in the right number of columns:

ad = pd.DataFrame.merge(df_presents, df_trees,
                        on=['practice', 'name'],

But then doing print list(aggregate_data.columns.values) shows me the following columns:

[org', u'name', u'spend_x', u'spend_y', u'items_x', u'items_y'...]

How can I rename spend_x to be presents_spend, etc?

Asked By: Richard



The suffixes option in the merge function does this. The defaults are suffixes=('_x', '_y').

In general, renaming columns can be done with the rename method.

Answered By: itzy

You can rename all the columns of ad by setting its columns as follows.

ad.columns = ['org', 'name', 'presents_spend', 'trees_spend']
Answered By: Nguyen Ngoc Tuan

Another way is adding suffix to the columns of your dataframe before merging:

ad.columns = 'ad_' + ad.columns.values
Answered By: Amirkhm
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.