Compare quantity of each item in dataframe in pandas

Question:

I fairly new to programming in python and using the pandas library
and I am having problems with comparing 2 dataframes with different quantities
i want to see if the quantity in data_1 is less than the quantity in data_2 for each item

import pandas as pd

data_1 = [['banana',10],['orange',2],['strawberry',3]]
data_2 = [['banana',1],['orange',2],['strawberry',5],['melon',8]]

df_1 = pd.Dataframe(data_1,columns = ['item','quantity'])
df_2 = pd.Dataframe(data_2,columns = ['item','quantity'])

enter image description here

enter image description here

i was trying to use pd.merge() the 2 dataframes to compare but it’s not quite what i was looking for…..
i needed a 3° dataframe with the differences only

Answers:

For such comparison item should be index and then you might concat quantity columns and compute desired column following way

import pandas as pd

data_1 = [['banana',10],['orange',2],['strawberry',3]]
data_2 = [['banana',1],['orange',2],['strawberry',5],['melon',8]]

df_1 = pd.DataFrame(data_1,columns = ['item','quantity'])
df_2 = pd.DataFrame(data_2,columns = ['item','quantity'])

df = pd.concat({"data1":df_1.set_index("item").quantity,"data2":df_2.set_index("item").quantity},axis=1,sort=False)
df['isless'] = df['data1'] < df['data2']
print(df)

output

            data1  data2  isless
banana       10.0      1   False
orange        2.0      2   False
strawberry    3.0      5    True
melon         NaN      8   False
Answered By: Daweo

If your looking at the difference between df_1 and df_2, you can substract the 'quantity' columns after seting the 'item' as the index:

import pandas as pd

data_1 = [['banana',10],['orange',2],['strawberry',3]]
data_2 = [['banana',1],['orange',2],['strawberry',5],['melon',8]]

df_1 = pd.DataFrame(data_1,columns = ['item','quantity'])
df_2 = pd.DataFrame(data_2,columns = ['item','quantity'])

# Setting 'item' as the index 
df_1.set_index('item', inplace=True)
df_2.set_index('item', inplace=True)

df_3 = df_1.subtract(df_2,fill_value=0)>0
df_3.columns=['df1>df2']

outputs:

            df1>df2
item               
banana         True
melon         False
orange        False
strawberry    False

Answered By: Mateusz BOGDAN
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.