Outer merging two data frames in place in pandas

Question:

How can I outer merge two data frames in place in pandas?

For example, assume we have these two data frames:

import pandas as pd

s1 = pd.DataFrame({
    'time':[1234567000,1234567005,1234567009],
    'X1':[96.32,96.01,96.05]
},columns=['time','X1'])  # to keep columns order

s2 = pd.DataFrame({
    'time':[1234567001,1234567005],
    'X2':[23.88,23.96]
},columns=['time','X2'])  # to keep columns order

They could be merged with pandas.DataFrame.merge (s3 = pd.merge(s1,s2,how='outer')) or with pandas.merge (s3=s1.merge(s2,how='outer')), but it isn’t in place. Instead, I’d like the merged data frame to replace s1 in memory.

Asked By: Franck Dernoncourt

||

Answers:

Since there is not inplace parameter in pandas.merge i think the most you can do is:

s1 = pd.merge(s1,s2,how='outer')

other than that, i don’t think there’s much left to do.
Hope that was helpful somehow.

Answered By: Rayhane Mama

The marked answer is correct, there is no built-in way to this. Here are a couple ways I found to fake it in certain scenarios. They’re likely quite slow, but they do suffice to minimize memory footprint. Use at your own risk.

s1["X2"] = float("NaN")

for i, row in s2.iterrows():
    if row.time in s1.time.values:
        s1.loc[s1.time == row.time, "X2"] = row.X2
    else:
        s1.loc[len(s1), :] = row

or

for _, row in s2.loc[s2.time.isin(s1.time)].iterrows():
    s1.loc[s1.time == row.time, "X2"] = row.X2

for _, row in s2.loc[~s2.time.isin(s1.time)].iterrows():
    s1.loc[len(s1), :] = row
Answered By: Alecg_O