Python/Pandas : How to self join a pandas dataframe on rows with same index

Question

I have a dataframe that looks like below

merge_id	identifier	Location	Value
1	A1	DEL	50
1	B2	HYD	60
2	C1	BEN	80
2	D2	HYD	10

I want the output dataframe to look like below

merge_id	identifier	Location	Value	m_identifier	m_Location	m_Value
1	A1	DEL	50	B2	HYD	60
2	C1	BEN	80	D2	HYD	10

Please can you suggest how I can do that

Asked By: Dolliy

||

Source

Answer 1

here is one way about it

df2=df.merge(df.mask(df['identifier'].str.endswith('1')),
         on='merge_id',
          how='left',
        suffixes=(None,'_m'))
df2=df2.mask(df2['identifier'].eq(df2['identifier_m']))
df2.dropna()

    merge_id    identifier  Location    Value   identifier_m    Location_m  Value_m
0        1.0            A1       DEL     50.0             B2           HYD     60.0
2        2.0            C1       BEN     80.0             D2           HYD     10.0

Answered By: Naveed

Answer 2

This looks like a pivot with a few tweaks:

df2 = (df.assign(c=df.groupby('merge_id').cumcount())
         .pivot(index='merge_id', columns='c')
         .sort_index(level=1, sort_remaining=False, axis=1)
      )

df2.columns = df2.columns.map(lambda x: f'{"m_" if x[1] else ""}{x[0]}')

print(df2.reset_index())

output:

   merge_id identifier Location  Value m_identifier m_Location  m_Value
0         1         A1      DEL     50           B2        HYD       60
1         2         C1      BEN     80           D2        HYD       10

Answered By: mozway

Answer 3

Another possible solution:

grouped = df.groupby('merge_id')
df1 = df.loc[grouped.head(1).index]
df2 = df.loc[grouped.tail(1).index].add_prefix('m_')
out = (df1.merge(df2, left_on='merge_id', right_on='m_merge_id')
       .drop('m_merge_id', axis = 1))

Output:

   merge_id identifier Location  Value m_identifier m_Location  m_Value
0         1         A1      DEL     50           B2        HYD       60
1         2         C1      BEN     80           D2        HYD       10

Answered By: PaulS

Python/Pandas : How to self join a pandas dataframe on rows with same index

Question:

Answers: