Outer merging two data frames in place in pandas
Question:
How can I outer merge two data frames in place in pandas?
For example, assume we have these two data frames:
import pandas as pd
s1 = pd.DataFrame({
'time':[1234567000,1234567005,1234567009],
'X1':[96.32,96.01,96.05]
},columns=['time','X1']) # to keep columns order
s2 = pd.DataFrame({
'time':[1234567001,1234567005],
'X2':[23.88,23.96]
},columns=['time','X2']) # to keep columns order
They could be merged with pandas.DataFrame.merge (s3 = pd.merge(s1,s2,how='outer')
) or with pandas.merge (s3=s1.merge(s2,how='outer')
), but it isn’t in place. Instead, I’d like the merged data frame to replace s1 in memory.
Answers:
Since there is not inplace
parameter in pandas.merge i think the most you can do is:
s1 = pd.merge(s1,s2,how='outer')
other than that, i don’t think there’s much left to do.
Hope that was helpful somehow.
The marked answer is correct, there is no built-in way to this. Here are a couple ways I found to fake it in certain scenarios. They’re likely quite slow, but they do suffice to minimize memory footprint. Use at your own risk.
s1["X2"] = float("NaN")
for i, row in s2.iterrows():
if row.time in s1.time.values:
s1.loc[s1.time == row.time, "X2"] = row.X2
else:
s1.loc[len(s1), :] = row
or
for _, row in s2.loc[s2.time.isin(s1.time)].iterrows():
s1.loc[s1.time == row.time, "X2"] = row.X2
for _, row in s2.loc[~s2.time.isin(s1.time)].iterrows():
s1.loc[len(s1), :] = row
How can I outer merge two data frames in place in pandas?
For example, assume we have these two data frames:
import pandas as pd
s1 = pd.DataFrame({
'time':[1234567000,1234567005,1234567009],
'X1':[96.32,96.01,96.05]
},columns=['time','X1']) # to keep columns order
s2 = pd.DataFrame({
'time':[1234567001,1234567005],
'X2':[23.88,23.96]
},columns=['time','X2']) # to keep columns order
They could be merged with pandas.DataFrame.merge (s3 = pd.merge(s1,s2,how='outer')
) or with pandas.merge (s3=s1.merge(s2,how='outer')
), but it isn’t in place. Instead, I’d like the merged data frame to replace s1 in memory.
Since there is not inplace
parameter in pandas.merge i think the most you can do is:
s1 = pd.merge(s1,s2,how='outer')
other than that, i don’t think there’s much left to do.
Hope that was helpful somehow.
The marked answer is correct, there is no built-in way to this. Here are a couple ways I found to fake it in certain scenarios. They’re likely quite slow, but they do suffice to minimize memory footprint. Use at your own risk.
s1["X2"] = float("NaN")
for i, row in s2.iterrows():
if row.time in s1.time.values:
s1.loc[s1.time == row.time, "X2"] = row.X2
else:
s1.loc[len(s1), :] = row
or
for _, row in s2.loc[s2.time.isin(s1.time)].iterrows():
s1.loc[s1.time == row.time, "X2"] = row.X2
for _, row in s2.loc[~s2.time.isin(s1.time)].iterrows():
s1.loc[len(s1), :] = row