How to add *or* update columns in a pandas DataFrame?

Question:

I have an existing DataFrame, and a method that computes a few columns to add to that DataFrame. I currently use pd.concat([left, right], axis=1). When I call this method a second time, however, it adds the columns again (with the same name).

With the following sample data frames left and right:

left = pd.DataFrame({'one': [1, 2, 3], 'two': [2, 3, 4]})
print(left)

   one  two
0    1    2
1    2    3
2    3    4

right = pd.DataFrame({'one': [22, 22, 22], 'NEW': [33, 33, 33]})
print(right)

   one  NEW
0   22   33
1   22   33
2   22   33

I am looking for a foo method whose result is the following:

left = left.foo(right)  # or foo(left, right)
print(left)

   one  two  NEW
0   22    2   33
1   22    3   33
2   22    4   33

And, importantly, if I call left.foo(right) a second time, I want the result to stay the same.

pd.join raises an error when a column already exists, pd.concat doesn’t overwrite existing columns, pd.update only overwrites existing columns but doesn’t add new ones.

Is there a function/method to do what I want or do I have to write one myself?


Solution: The solution that worked for me, combined from the two answers below, is:

result = left.
        drop(left.columns.intersection(right.columns), axis=1).
        join(right)

Answers:

Take intersection and drop columns then merge on index :

left = left.drop(left.columns.intersection(right.columns),1).merge(right, left_index=True, right_index=True)

print(left)
   two  one  NEW
0    2   22   33
1    3   22   33
2    4   22   33
Answered By: Space Impact

Alternative solution, but it only add new columns, not overwrite:

left = pd.concat([left, right[right.columns.difference(left.columns)]], axis=1)

left = pd.concat([left, right[right.columns.difference(left.columns)]], axis=1)
print (left)
2   22   33
   one  two  NEW
0    1    2   33
1    2    3   33
2    3    4   33
Answered By: jezrael

This is a simple method that will update existing columns
or add new ones if needed:

left.loc[right.index, right.columns] = right
print(left)

   one  two  NEW
0   22    2   33
1   22    3   33
2   22    4   33

The index keys from right must be in left already, but the columns from right will be added if needed.

Answered By: Matthias Fripp

Thanks for the solution. I just wanted to add a simple change if right have more rows then left, the proposed solutions will not work. However the fix is simple just add how="right" to the join:

result = left.drop(left.columns.intersection(right.columns), axis=1).join(right, how="right")
Answered By: Henrik Larsen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.