Join or Merge Pandas Dataframes filling null values
Question:
I have two dataframes – Deployment
and HPA
as shown. Both dataframes mostly share the values in columns Deployment-Label
and HPA-Label
which need to merge/join by filling-up null values in all other columns in both dataframes (wherever the values are missing in either of columns Deployment-Label
and HPA-Label
.)
Tried using pd.merge
, but the it is dropping the non-common rows in both df.
# Merge the two dataframes, using LABEL common col, after renaming
df3 = pd.merge(df1, df2, on = 'LABEL')
# Write it to a new CSV file
df3.to_csv('output/merged.csv')
Deployment-Label
Deployment Table – Other Columns
accountdataservice
1
accountservice-grpc
2
fmsdataservice
3
fuelbenchmarkservice-grpc
4
fueldeviationservice-grpc
5
httpclientservice
6
packageservice-grpc
7
provisioningdataservice
8
traefik-ingress-ilb
9
translationservice-grpc
10
HPA-Label
HPA Table – Other Columns
accountdataservice
A
accountservice-grpc
B
fmsdataservice
C
fuelbenchmarkservice-grpc
D
fueldeviationservice-grpc
E
hangfire
F
httpclientservice
G
packageservice-grpc
H
portalservicerest
I
provisioningdataservice
J
schedularservice-grpc
K
translationservice-grpc
L
Expected Outcome:
Answers:
You can pass how="outer"
and sort=True
as keyword arguments to pd.merge
:
df3 = pd.merge(df1, df2, on = "LABEL", how="outer", sort=True)
#or given your MRE, use df3 = pd.merge(df1, df2, left_on="Deployment-Label",
# right_on = "HPA-Label", how="outer", sort=True)
​Output :
print(df3)
Deployment-Label Deployment Table - Other Columns HPA-Label HPA Table - Other Columns
0 accountdataservice 1.0 accountdataservice A
1 accountservice-grpc 2.0 accountservice-grpc B
2 fmsdataservice 3.0 fmsdataservice C
3 fuelbenchmarkservice-grpc 4.0 fuelbenchmarkservice-grpc D
4 fueldeviationservice-grpc 5.0 fueldeviationservice-grpc E
5 NaN NaN hangfire F
6 httpclientservice 6.0 httpclientservice G
7 packageservice-grpc 7.0 packageservice-grpc H
8 NaN NaN portalservicerest I
9 provisioningdataservice 8.0 provisioningdataservice J
10 NaN NaN schedularservice-grpc K
11 traefik-ingress-ilb 9.0 NaN NaN
12 translationservice-grpc 10.0 translationservice-grpc L
I have two dataframes – Deployment
and HPA
as shown. Both dataframes mostly share the values in columns Deployment-Label
and HPA-Label
which need to merge/join by filling-up null values in all other columns in both dataframes (wherever the values are missing in either of columns Deployment-Label
and HPA-Label
.)
Tried using pd.merge
, but the it is dropping the non-common rows in both df.
# Merge the two dataframes, using LABEL common col, after renaming
df3 = pd.merge(df1, df2, on = 'LABEL')
# Write it to a new CSV file
df3.to_csv('output/merged.csv')
Deployment-Label | Deployment Table – Other Columns |
---|---|
accountdataservice | 1 |
accountservice-grpc | 2 |
fmsdataservice | 3 |
fuelbenchmarkservice-grpc | 4 |
fueldeviationservice-grpc | 5 |
httpclientservice | 6 |
packageservice-grpc | 7 |
provisioningdataservice | 8 |
traefik-ingress-ilb | 9 |
translationservice-grpc | 10 |
HPA-Label | HPA Table – Other Columns |
---|---|
accountdataservice | A |
accountservice-grpc | B |
fmsdataservice | C |
fuelbenchmarkservice-grpc | D |
fueldeviationservice-grpc | E |
hangfire | F |
httpclientservice | G |
packageservice-grpc | H |
portalservicerest | I |
provisioningdataservice | J |
schedularservice-grpc | K |
translationservice-grpc | L |
Expected Outcome:
You can pass how="outer"
and sort=True
as keyword arguments to pd.merge
:
df3 = pd.merge(df1, df2, on = "LABEL", how="outer", sort=True)
#or given your MRE, use df3 = pd.merge(df1, df2, left_on="Deployment-Label",
# right_on = "HPA-Label", how="outer", sort=True)
​Output :
print(df3)
Deployment-Label Deployment Table - Other Columns HPA-Label HPA Table - Other Columns
0 accountdataservice 1.0 accountdataservice A
1 accountservice-grpc 2.0 accountservice-grpc B
2 fmsdataservice 3.0 fmsdataservice C
3 fuelbenchmarkservice-grpc 4.0 fuelbenchmarkservice-grpc D
4 fueldeviationservice-grpc 5.0 fueldeviationservice-grpc E
5 NaN NaN hangfire F
6 httpclientservice 6.0 httpclientservice G
7 packageservice-grpc 7.0 packageservice-grpc H
8 NaN NaN portalservicerest I
9 provisioningdataservice 8.0 provisioningdataservice J
10 NaN NaN schedularservice-grpc K
11 traefik-ingress-ilb 9.0 NaN NaN
12 translationservice-grpc 10.0 translationservice-grpc L