Merging two Pandas DataFrames based on the sequential order of two columns
Question:
I know questions related to this one have been asked multiple times, but I can’t find anything specific this one. I have gone through pandas.pydata.org/docs/user_guide/merging.html but I still can’t find what I need.
I have two very large output files that I need to merge together, based on the timestamp column of each file. I need the timestamp columns to interleave together in sequential order. Here is an example.
df1
x1 y1 z1
25 12 0.71
16 13 0.63
41 13 0.84
3 14 0.55
25 17 0.49
df2
x2 y2 z2
73 11 0.31
105 12 0.57
64 12 0.86
92 13 0.42
92 15 0.63
81 18 0.74
I need these DataFrames merged based on the sequential order of the y1 and y2 columns.
df3
x3 y3 z3
73 11 0.31
25 12 0.71
105 12 0.57
64 12 0.86
41 13 0.84
92 13 0.42
3 14 0.55
92 15 0.63
25 17 0.49
81 18 0.74
So far I have tried using Pandas concat with sort_values.
df3 = pd.concat([df1,df2]).sort_values(by=['y1','y2'], ascending=True)
Unfortunately I keep getting errors this way. I know there’s a way to do this, but I haven’t been able to find it. Can anyone offer advice?
Answers:
The column names differ – you could rename the columns in one of the dataframes so they align.
pd.concat([
df1,
df2.rename(columns=dict(zip(df2.columns, df1.columns)))
]).sort_values("y1")
x1 y1 z1
0 73 11 0.31
0 25 12 0.71
1 105 12 0.57
2 64 12 0.86
1 16 13 0.63
2 41 13 0.84
3 92 13 0.42
3 3 14 0.55
4 92 15 0.63
4 25 17 0.49
5 81 18 0.74
You can use ignore_index=True
in the .concat
if desired.
To make it easier to combine (concatenate) two dataframes vertically, first rename both the dataframes;
df1.columns = ['x', 'y', 'z']
df2.columns = ['x', 'y', 'z']
Once the column are renamed, we can sort_values
at column y
. Use ignore_index = True
to generate new row index.
pd.concat([df_1, df_2], ignore_index=True).sort_values('y')
Output:
x y z
5 73 11 0.31
0 25 12 0.71
6 105 12 0.57
7 64 12 0.86
1 16 13 0.63
2 41 13 0.84
8 92 13 0.42
3 3 14 0.55
9 92 15 0.63
4 25 17 0.49
10 81 18 0.74
I know questions related to this one have been asked multiple times, but I can’t find anything specific this one. I have gone through pandas.pydata.org/docs/user_guide/merging.html but I still can’t find what I need.
I have two very large output files that I need to merge together, based on the timestamp column of each file. I need the timestamp columns to interleave together in sequential order. Here is an example.
df1
x1 y1 z1
25 12 0.71
16 13 0.63
41 13 0.84
3 14 0.55
25 17 0.49
df2
x2 y2 z2
73 11 0.31
105 12 0.57
64 12 0.86
92 13 0.42
92 15 0.63
81 18 0.74
I need these DataFrames merged based on the sequential order of the y1 and y2 columns.
df3
x3 y3 z3
73 11 0.31
25 12 0.71
105 12 0.57
64 12 0.86
41 13 0.84
92 13 0.42
3 14 0.55
92 15 0.63
25 17 0.49
81 18 0.74
So far I have tried using Pandas concat with sort_values.
df3 = pd.concat([df1,df2]).sort_values(by=['y1','y2'], ascending=True)
Unfortunately I keep getting errors this way. I know there’s a way to do this, but I haven’t been able to find it. Can anyone offer advice?
The column names differ – you could rename the columns in one of the dataframes so they align.
pd.concat([
df1,
df2.rename(columns=dict(zip(df2.columns, df1.columns)))
]).sort_values("y1")
x1 y1 z1
0 73 11 0.31
0 25 12 0.71
1 105 12 0.57
2 64 12 0.86
1 16 13 0.63
2 41 13 0.84
3 92 13 0.42
3 3 14 0.55
4 92 15 0.63
4 25 17 0.49
5 81 18 0.74
You can use ignore_index=True
in the .concat
if desired.
To make it easier to combine (concatenate) two dataframes vertically, first rename both the dataframes;
df1.columns = ['x', 'y', 'z']
df2.columns = ['x', 'y', 'z']
Once the column are renamed, we can sort_values
at column y
. Use ignore_index = True
to generate new row index.
pd.concat([df_1, df_2], ignore_index=True).sort_values('y')
Output:
x y z
5 73 11 0.31
0 25 12 0.71
6 105 12 0.57
7 64 12 0.86
1 16 13 0.63
2 41 13 0.84
8 92 13 0.42
3 3 14 0.55
9 92 15 0.63
4 25 17 0.49
10 81 18 0.74