merge rows into new column value

Question:

I am taking a df that is all dup value pairs and then from the 2nd row take the 2nd column value and add it to the first row in a new column called ‘new_amt’ then inserting NaN for the second row and new third column. After I’ll drop all row that contain NaN.

so the dataframe look like this:

ref_num Amt
row 1 1 10
row 2 1 20
row 3 2 5
row 4 2 15
row 5 3 12
row 6 3 7

after it should look like this:

ref_num Amt new_Amt
row 1 1 10 20
row 2 1 20 NaN
row 3 2 5 15
row 4 2 15 NaN
row 5 3 12 7
row 6 3 7 NaN

I thought a lambda function could work where I’d have the else statement return NaN for all the second dup rows but I could figure out the syntax.

df[‘new_Amt’] = df.apply(lambda x : x[‘Amt’] if x[‘ref_num’] == x[‘ref_num’] else x[‘new_Amt’] is NaN)

Asked By: brewig615

||

Answers:

Why not do both operations at once (resolve duplicates as you describe and drop the redundant rows)?

k = 'ref_num'
newdf = df.drop_duplicates(subset=k, keep='first').merge(
    df.drop_duplicates(subset=k, keep='last'), on='ref_num', suffixes=('', '_new'))
>>> newdf
   ref_num  Amt  Amt_new
0        1   10       20
1        2    5       15
2        3   12        7

Another possibility:

gb = df.groupby('ref_num')['Amt']
newdf = pd.concat([gb.first(), gb.last()], axis=1, keys=['Amt', 'new_Amt']).reset_index()
>>> newdf
   ref_num  Amt  new_Amt
0        1   10       20
1        2    5       15
2        3   12        7

Note: in your question it is not clear if 'row 1', 'row 2' etc. are indices, meant to be kept or not, etc. If they are desired in the final output, please let us know if and how they should appear.

Answered By: Pierre D
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.