Merge two rows and put the results in the same columns

Question:

I would like to join two files by key but I would like some columns to be joined together for example:

File1:

    df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
               'Color' : ['red', 'red', 'blue','yellow', 'red']})

        List        Color
0  P111              red
1  P999              red
2  P111;P999;P777    blue
3  P555              yellow
4  P666;P111;P333    red

File2:

    df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
               'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
               'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})

    Cod     Animal     Letter
 0  P111    DOG         A,F
 1  P222    CAT         C
 2  P333    BUG         S,M
 3  P444    SNAKE,DOG   F,L
 4  P555    CAT,BUG     C,A
 5  P666    DOG         M,C
 6  P777    SNAKE       Z,L

I would like to merge with primary key List (from file1) and Cod (from file2), to get:

     List           Color     Animal             Letter
0  P111              red      DOG,FROG           A,F
1  P999              red      -                  -
2  P111;P999;P777    blue     DOG,FROG|-|SNAKE   A,F|-|Z,L
3  P555              yellow   CAT,BUG            C,A 
4  P666;P111;P333    red      DOG|DOG,FROG|BUG   M,C|A,F|S,M

I think we need something like a left join and an agragation but I don’t know how. In the final table I would like that in the values ​​not found there was a . While I would like a | to separate values ​​in "aggregated" columns

Which is the best way?

Asked By: Mario

||

Answers:

Please specify if you want to perform join or not. Since you didn’t mention any primary key. I am assuming you want to concatenate.

import pandas as pd
df1 = pd.read_excel(pwd + '/Book1.xlsx')
df2 = pd.read_excel(pwd + '/Book2.xlsx')


df_combined = pd.concat([df1,df2], ignore_index=True)


print(df_combined)

   list   color   cod animal
0  p111     red   NaN    NaN
1  p222    blue   NaN    NaN
2  p333  green    NaN    NaN
3  p444  yellow   NaN    NaN
4  p555     NaN   NaN    NaN
5   NaN     NaN  p111    rat
6   NaN     NaN  p222    cat
7   NaN     NaN  p333    bat
8   NaN     NaN  p444    hat
9   NaN     NaN  p555   cool

Result will be like above-

Also please provide sample data so that it can be in copy-pasted.

Answered By: GIRIXH

Idea is use DataFrame.explode by splitted values of List, then use left join and aggregate first values with join for expected ouput:

df = (df1.assign(Cod = df1['List'].str.split(';'))
         .explode('Cod')
         .reset_index()
         .merge(df2, how='left', on='Cod')
         .fillna('-')
         .groupby('index')
         .agg(List=('List','first'),
              Color=('Color','first'),
              Animal=('Animal','|'.join),
              Letter=('Letter','|'.join))
         .rename_axis(None))
print (df)
             List   Color       Animal       Letter
0            P111     red          DOG          A,F
1            P999     red            -            -
2  P111;P999;P777    blue  DOG|-|SNAKE    A,F|-|Z,L
3            P555  yellow      CAT,BUG          C,A
4  P666;P111;P333     red  DOG|DOG|BUG  M,C|A,F|S,M
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.