how to merge two data frame so I can get same columns and rows merged

Question:

I have the following sample data frames and want to merge them to get the result. I tried outer join, but the result was not what I wanted.

df1 = pd.DataFrame(
    {
        "I": ["I1","I2", "I3", "I4"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
)


df2 = pd.DataFrame(
    {
        "I":["I1","I4", "I5", "I6", "I7"],
        "E": ["A5", "A6", "A7","A8","A9"],
        "F": ["B5", "B6", "B7","B8","B9"],
        "G": ["C5", "C6", "C7","C8","C9"],
        "H": ["D5", "D6", "D7","D8","D9"],
    },
)

result=  pd.DataFrame(
    {
        "I": ["I1", "I2", "I3", "I4", "I5", "I6", "I7"],
        "A": ["A0", "A1", "A2", "A3", "00", "00", "00"],
        "B": ["B0", "B1", "B2", "B3", "00", "00", "00"],
        "C": ["C0", "C1", "C2", "C3", "00", "00", "00"],
        "D": ["D0", "D1", "D2", "D3", "00", "00", "00"],
        "E": ["A5", "00", "00", "A6", "A7", "A8", "A9"],
        "F": ["B5", "00", "00", "B6", "B7", "B8", "B9"],
        "G": ["C5", "00", "00",  "C6", "C7", "C8", "C9"],
        "H": ["D5", "00", "00",  "D6", "D7", "D8", "D9"],
    },
)
df1.set_index('I')
df2.set_index('I')
df_merg=pd.concat([df1,df2],join='outer').fillna(0)
print('Result of merge:')
print(df_merg)
print('Expected result')
print(result)

running the above code generates:

Result of merge:
    I   A   B   C   D   E   F   G   H
0  I1  A0  B0  C0  D0   0   0   0   0
1  I2  A1  B1  C1  D1   0   0   0   0
2  I3  A2  B2  C2  D2   0   0   0   0
3  I4  A3  B3  C3  D3   0   0   0   0
0  I1   0   0   0   0  A5  B5  C5  D5
1  I4   0   0   0   0  A6  B6  C6  D6
2  I5   0   0   0   0  A7  B7  C7  D7
3  I6   0   0   0   0  A8  B8  C8  D8
4  I7   0   0   0   0  A9  B9  C9  D9
Expected result
    I   A   B   C   D   E   F   G   H
0  I1  A0  B0  C0  D0  A5  B5  C5  D5
1  I2  A1  B1  C1  D1  00  00  00  00
2  I3  A2  B2  C2  D2  00  00  00  00
3  I4  A3  B3  C3  D3  A6  B6  C6  D6
4  I5  00  00  00  00  A7  B7  C7  D7
5  I6  00  00  00  00  A8  B8  C8  D8
6  I7  00  00  00  00  A9  B9  C9  D9

As can be seen, the merged data has two rows with the index of I1 (and I4) but what I want is to have the merged data for I1 be only one row but data from the two data frames be next to each other.

How can I achieve the merged data frame as shown in the question?

Asked By: mans

||

Answers:

Use how='left' as parameter of pd.merge:

>>> df1.merge(df2, on='I', how='outer').fillna('00')
    I   A   B   C   D   E   F   G   H
0  I1  A0  B0  C0  D0  A5  B5  C5  D5
1  I2  A1  B1  C1  D1  00  00  00  00
2  I3  A2  B2  C2  D2  00  00  00  00
3  I4  A3  B3  C3  D3  A6  B6  C6  D6
4  I5  00  00  00  00  A7  B7  C7  D7
5  I6  00  00  00  00  A8  B8  C8  D8
6  I7  00  00  00  00  A9  B9  C9  D9
Answered By: Corralien

Outer is correct, you can use it in merge:


df = pd.merge(df1, df2, on='I', how='outer')
print(df)
#     I    A    B    C    D    E    F    G    H
# 0  I1   A0   B0   C0   D0   A5   B5   C5   D5
# 1  I2   A1   B1   C1   D1  NaN  NaN  NaN  NaN
# 2  I3   A2   B2   C2   D2  NaN  NaN  NaN  NaN
# 3  I4   A3   B3   C3   D3   A6   B6   C6   D6
# 4  I5  NaN  NaN  NaN  NaN   A7   B7   C7   D7
# 5  I6  NaN  NaN  NaN  NaN   A8   B8   C8   D8
# 6  I7  NaN  NaN  NaN  NaN   A9   B9   C9   D9
Answered By: JarroVGIT
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.