Python/Pandas : How to self join a pandas dataframe on rows with same index

Question:

I have a dataframe that looks like below

merge_id identifier Location Value
1 A1 DEL 50
1 B2 HYD 60
2 C1 BEN 80
2 D2 HYD 10

I want the output dataframe to look like below

merge_id identifier Location Value m_identifier m_Location m_Value
1 A1 DEL 50 B2 HYD 60
2 C1 BEN 80 D2 HYD 10

Please can you suggest how I can do that

Asked By: Dolliy

||

Answers:

here is one way about it

df2=df.merge(df.mask(df['identifier'].str.endswith('1')),
         on='merge_id',
          how='left',
        suffixes=(None,'_m'))
df2=df2.mask(df2['identifier'].eq(df2['identifier_m']))
df2.dropna()
    merge_id    identifier  Location    Value   identifier_m    Location_m  Value_m
0        1.0            A1       DEL     50.0             B2           HYD     60.0
2        2.0            C1       BEN     80.0             D2           HYD     10.0
Answered By: Naveed

This looks like a pivot with a few tweaks:

df2 = (df.assign(c=df.groupby('merge_id').cumcount())
         .pivot(index='merge_id', columns='c')
         .sort_index(level=1, sort_remaining=False, axis=1)
      )

df2.columns = df2.columns.map(lambda x: f'{"m_" if x[1] else ""}{x[0]}')

print(df2.reset_index())

output:

   merge_id identifier Location  Value m_identifier m_Location  m_Value
0         1         A1      DEL     50           B2        HYD       60
1         2         C1      BEN     80           D2        HYD       10
Answered By: mozway

Another possible solution:

grouped = df.groupby('merge_id')
df1 = df.loc[grouped.head(1).index]
df2 = df.loc[grouped.tail(1).index].add_prefix('m_')
out = (df1.merge(df2, left_on='merge_id', right_on='m_merge_id')
       .drop('m_merge_id', axis = 1))

Output:

   merge_id identifier Location  Value m_identifier m_Location  m_Value
0         1         A1      DEL     50           B2        HYD       60
1         2         C1      BEN     80           D2        HYD       10
Answered By: PaulS
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.