Find columns negative value in another column – dataframe
Question:
I have this code:
test = {"number": ['1555','1666','1777', '1888', '1999'],
"order_amount": ['100.00','200.00','-200.00', '300.00', '-150.00'],
"number_of_refund": ['','','1666', '', '1888']
}
df = pd.DataFrame(test)
Which returns the following dataframe:
number order_amount number_of_refund
0 1555 100.00
1 1666 200.00
2 1777 -200.00 1666
3 1888 300.00
4 1999 -150.00 1888
I want to remove order and order refund entries if:
- "number_of_refund" matches a number from "number" column (there might not be a number of order in the dataframe if order was made last month and refund during the current month)
- amount of "number_of_refund" (which was matched to "number") has a negative amount of "number" amount (in this case number 1666 has 200, and refund of 1666 has -200 so both rows should be removed)
So the result in this case should be:
number order_amount number_of_refund
0 1555 100.00
3 1888 300.00
4 1999 -150.00 1888
How do I check if amount of one column’s value is in another column but with opposite amount (negative)?
Answers:
IIUC, you can use a boolean indexing approach:
# ensure numeric values
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')
# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']
# get reimbursements and find which ones will equal the original value
reimb = df['number'].map(s)
m2 = reimb.eq(-df['order_amount'])
m3 = df['number_of_refund'].isin(df.loc[m2, 'number'])
# keep rows that do not match any m2 or m3 mask
df = df[~(m2|m3)]
output:
number order_amount number_of_refund
0 1555 100.0
3 1888 300.0
4 1999 -150.0 1888
Let’s say I change the refunded amount for 1999 to be -200.00
test = {"number": ['1555','1666','1777', '1888', '1999'],
"order_amount": ['100.00','200.00','-200.00', '300.00', '-200.00'],
"number_of_refund": ['','','1666', '', '1888'] }
df = pd.DataFrame(test)
print(df)
number order_amount number_of_refund
0 1555 100.00
1 1666 200.00
2 1777 -200.00 1666
3 1888 300.00
4 1999 -200.00 1888
Here’s another way to do it. I create a unique string by concatenating the number_of_refund
(filled with the number
column on the blanks) and the absolute order_amount
(ie, without the negative sign), then drop both duplicates found
df['unique'] = df.apply(lambda x: x['order_amount'].replace('-','')+'|'+x['number'] if x['number_of_refund']=='' else x['order_amount'].replace('-','')+'|'+x['number_of_refund'], axis=1)
#df['unique'] = df['order_amount'].str.replace('-','') + '|' + df['number_of_refund'].mask(df['number_of_refund'].eq(''), df['number']) #the same
print(df)
number order_amount number_of_refund unique
0 1555 100.00 100.00|1555
1 1666 200.00 200.00|1666 #duplicate
2 1777 -200.00 1666 200.00|1666 #duplicate
3 1888 300.00 300.00|1888
4 1999 -200.00 1888 200.00|1888
The duplicate rows are easily identified, and ready to be dropped (including the column unique
)
df = df.drop_duplicates(['unique'], keep=False).drop(columns=['unique'])
print(df)
number order_amount number_of_refund
0 1555 100.00
3 1888 300.00
4 1999 -200.00 1888
I have this code:
test = {"number": ['1555','1666','1777', '1888', '1999'],
"order_amount": ['100.00','200.00','-200.00', '300.00', '-150.00'],
"number_of_refund": ['','','1666', '', '1888']
}
df = pd.DataFrame(test)
Which returns the following dataframe:
number order_amount number_of_refund
0 1555 100.00
1 1666 200.00
2 1777 -200.00 1666
3 1888 300.00
4 1999 -150.00 1888
I want to remove order and order refund entries if:
- "number_of_refund" matches a number from "number" column (there might not be a number of order in the dataframe if order was made last month and refund during the current month)
- amount of "number_of_refund" (which was matched to "number") has a negative amount of "number" amount (in this case number 1666 has 200, and refund of 1666 has -200 so both rows should be removed)
So the result in this case should be:
number order_amount number_of_refund
0 1555 100.00
3 1888 300.00
4 1999 -150.00 1888
How do I check if amount of one column’s value is in another column but with opposite amount (negative)?
IIUC, you can use a boolean indexing approach:
# ensure numeric values
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')
# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']
# get reimbursements and find which ones will equal the original value
reimb = df['number'].map(s)
m2 = reimb.eq(-df['order_amount'])
m3 = df['number_of_refund'].isin(df.loc[m2, 'number'])
# keep rows that do not match any m2 or m3 mask
df = df[~(m2|m3)]
output:
number order_amount number_of_refund
0 1555 100.0
3 1888 300.0
4 1999 -150.0 1888
Let’s say I change the refunded amount for 1999 to be -200.00
test = {"number": ['1555','1666','1777', '1888', '1999'],
"order_amount": ['100.00','200.00','-200.00', '300.00', '-200.00'],
"number_of_refund": ['','','1666', '', '1888'] }
df = pd.DataFrame(test)
print(df)
number order_amount number_of_refund
0 1555 100.00
1 1666 200.00
2 1777 -200.00 1666
3 1888 300.00
4 1999 -200.00 1888
Here’s another way to do it. I create a unique string by concatenating the number_of_refund
(filled with the number
column on the blanks) and the absolute order_amount
(ie, without the negative sign), then drop both duplicates found
df['unique'] = df.apply(lambda x: x['order_amount'].replace('-','')+'|'+x['number'] if x['number_of_refund']=='' else x['order_amount'].replace('-','')+'|'+x['number_of_refund'], axis=1)
#df['unique'] = df['order_amount'].str.replace('-','') + '|' + df['number_of_refund'].mask(df['number_of_refund'].eq(''), df['number']) #the same
print(df)
number order_amount number_of_refund unique
0 1555 100.00 100.00|1555
1 1666 200.00 200.00|1666 #duplicate
2 1777 -200.00 1666 200.00|1666 #duplicate
3 1888 300.00 300.00|1888
4 1999 -200.00 1888 200.00|1888
The duplicate rows are easily identified, and ready to be dropped (including the column unique
)
df = df.drop_duplicates(['unique'], keep=False).drop(columns=['unique'])
print(df)
number order_amount number_of_refund
0 1555 100.00
3 1888 300.00
4 1999 -200.00 1888