merge rows into new column value

Question

I am taking a df that is all dup value pairs and then from the 2nd row take the 2nd column value and add it to the first row in a new column called ‘new_amt’ then inserting NaN for the second row and new third column. After I’ll drop all row that contain NaN.

so the dataframe look like this:

	ref_num	Amt
row 1	1	10
row 2	1	20
row 3	2	5
row 4	2	15
row 5	3	12
row 6	3	7

after it should look like this:

	ref_num	Amt	new_Amt
row 1	1	10	20
row 2	1	20	NaN
row 3	2	5	15
row 4	2	15	NaN
row 5	3	12	7
row 6	3	7	NaN

I thought a lambda function could work where I’d have the else statement return NaN for all the second dup rows but I could figure out the syntax.

df[‘new_Amt’] = df.apply(lambda x : x[‘Amt’] if x[‘ref_num’] == x[‘ref_num’] else x[‘new_Amt’] is NaN)

Asked By: brewig615

||

Source

Answer 1

Why not do both operations at once (resolve duplicates as you describe and drop the redundant rows)?

k = 'ref_num'
newdf = df.drop_duplicates(subset=k, keep='first').merge(
    df.drop_duplicates(subset=k, keep='last'), on='ref_num', suffixes=('', '_new'))
>>> newdf
   ref_num  Amt  Amt_new
0        1   10       20
1        2    5       15
2        3   12        7

Another possibility:

gb = df.groupby('ref_num')['Amt']
newdf = pd.concat([gb.first(), gb.last()], axis=1, keys=['Amt', 'new_Amt']).reset_index()
>>> newdf
   ref_num  Amt  new_Amt
0        1   10       20
1        2    5       15
2        3   12        7

Note: in your question it is not clear if 'row 1', 'row 2' etc. are indices, meant to be kept or not, etc. If they are desired in the final output, please let us know if and how they should appear.

Answered By: Pierre D

merge rows into new column value

Question:

Answers: