Merging two dataframe in pandas
Question:
I have following dataframes
df1
ind
stay
res
AB10
yes
yes
AB11
no
yes
df2
ind
date
c
AB10
22-01
1
AB10
24-01
2
AB11
20-01
1
AB11
21-01
2
AB11
22-01
3
I want following dataframe as a final one
ind
date
c
stay
res
AB10
yes
yes
AB10
22-01
1
AB10
24-01
2
AB11
no
yes
AB11
20-01
1
AB11
21-01
2
AB11
22-01
3
How to get this using pandas?
Answers:
I believe what you are looking for is a pd.concat()
for your dataframes
df_con = pd.concat([df1, df2])
df_con.sort_values('ind').fillna('')
This will do what your question asks:
df1 = pd.concat([
df1.reindex(columns=list(df2.columns) + list(df1.columns)[1:]),
df2.reindex(columns=list(df2.columns) + list(df1.columns)[1:])]).sort_values('ind')
Output:
ind date c stay res
0 AB10 NaN NaN yes yes
0 AB10 22-01 1.0 NaN NaN
1 AB10 24-01 2.0 NaN NaN
1 AB11 NaN NaN no yes
2 AB11 20-01 1.0 NaN NaN
3 AB11 21-01 2.0 NaN NaN
4 AB11 22-01 3.0 NaN NaN
If you want empty strings instead of NaN
values, append .fillna('')
:
ind date c stay res
0 AB10 yes yes
0 AB10 22-01 1.0
1 AB10 24-01 2.0
1 AB11 no yes
2 AB11 20-01 1.0
3 AB11 21-01 2.0
4 AB11 22-01 3.0
A slightly shorter alternative would be to use concat()
first and then order the columns:
df1 = pd.concat([df1, df2])[list(df2.columns) + list(df1.columns)[1:]].sort_values('ind').fillna('')
I have following dataframes
df1
ind | stay | res |
---|---|---|
AB10 | yes | yes |
AB11 | no | yes |
df2
ind | date | c |
---|---|---|
AB10 | 22-01 | 1 |
AB10 | 24-01 | 2 |
AB11 | 20-01 | 1 |
AB11 | 21-01 | 2 |
AB11 | 22-01 | 3 |
I want following dataframe as a final one
ind | date | c | stay | res |
---|---|---|---|---|
AB10 | yes | yes | ||
AB10 | 22-01 | 1 | ||
AB10 | 24-01 | 2 | ||
AB11 | no | yes | ||
AB11 | 20-01 | 1 | ||
AB11 | 21-01 | 2 | ||
AB11 | 22-01 | 3 |
How to get this using pandas?
I believe what you are looking for is a pd.concat()
for your dataframes
df_con = pd.concat([df1, df2])
df_con.sort_values('ind').fillna('')
This will do what your question asks:
df1 = pd.concat([
df1.reindex(columns=list(df2.columns) + list(df1.columns)[1:]),
df2.reindex(columns=list(df2.columns) + list(df1.columns)[1:])]).sort_values('ind')
Output:
ind date c stay res
0 AB10 NaN NaN yes yes
0 AB10 22-01 1.0 NaN NaN
1 AB10 24-01 2.0 NaN NaN
1 AB11 NaN NaN no yes
2 AB11 20-01 1.0 NaN NaN
3 AB11 21-01 2.0 NaN NaN
4 AB11 22-01 3.0 NaN NaN
If you want empty strings instead of NaN
values, append .fillna('')
:
ind date c stay res
0 AB10 yes yes
0 AB10 22-01 1.0
1 AB10 24-01 2.0
1 AB11 no yes
2 AB11 20-01 1.0
3 AB11 21-01 2.0
4 AB11 22-01 3.0
A slightly shorter alternative would be to use concat()
first and then order the columns:
df1 = pd.concat([df1, df2])[list(df2.columns) + list(df1.columns)[1:]].sort_values('ind').fillna('')