Pandas drop only duplicates matching a condition
Question:
I have a dataframe with duplicates in "Model Number". I want to keep each row that has "Difference" equaling 0.00 and remove its duplicate, but if a duplicate pair does not have a "Difference" equaling 0.00 then I don’t want to remove it.
Original Dataframe
Thank you for your help.
Answers:
Try the following code;
df1 = df[df["Difference"] == 0]
lst_model = [i for i in df["Model Number"].unique() if i not in df1["Model Number"].unique()]
df2 = df[df["Model Number"].isin(lst_model)]
df_final = pd.concat([df1,df2])
import pandas as pd
import numpy as np
col_1 = ['401','401','54','54','55A','55A','248','248']
col_2 = [14.97,14.97,6.44, 6.44,12.47,12.47,17.66,17.66]
col_3 = [14.97,7.94 ,6.44,17.89,19.89,8.47,17.66,14.51]
col_4 = np.array(col_2) - np.array(col_3)
df = pd.DataFrame({'model_number':col_1,'price':col_2,'new_price':col_3,'difference':col_4}) # create the dataframe
df['is_difference_zero'] = df['difference'] == 0 # Get all columns where difference is zero
df = pd.merge(df,df.groupby('model_number').agg({'is_difference_zero':'sum'})['is_difference_zero'],on='model_number')
# If difference not zero for both models then the sum of is_difference_zero grouped by model_number is two
df_final = df[df['is_difference_zero_x'] + df['is_difference_zero_y'] != 1][['model_number','price','new_price','difference']]
# Get the rows where the sum of is_difference_zero_x and is_difference_zero_y is either zero or two
I have a dataframe with duplicates in "Model Number". I want to keep each row that has "Difference" equaling 0.00 and remove its duplicate, but if a duplicate pair does not have a "Difference" equaling 0.00 then I don’t want to remove it.
Original Dataframe
Thank you for your help.
Try the following code;
df1 = df[df["Difference"] == 0]
lst_model = [i for i in df["Model Number"].unique() if i not in df1["Model Number"].unique()]
df2 = df[df["Model Number"].isin(lst_model)]
df_final = pd.concat([df1,df2])
import pandas as pd
import numpy as np
col_1 = ['401','401','54','54','55A','55A','248','248']
col_2 = [14.97,14.97,6.44, 6.44,12.47,12.47,17.66,17.66]
col_3 = [14.97,7.94 ,6.44,17.89,19.89,8.47,17.66,14.51]
col_4 = np.array(col_2) - np.array(col_3)
df = pd.DataFrame({'model_number':col_1,'price':col_2,'new_price':col_3,'difference':col_4}) # create the dataframe
df['is_difference_zero'] = df['difference'] == 0 # Get all columns where difference is zero
df = pd.merge(df,df.groupby('model_number').agg({'is_difference_zero':'sum'})['is_difference_zero'],on='model_number')
# If difference not zero for both models then the sum of is_difference_zero grouped by model_number is two
df_final = df[df['is_difference_zero_x'] + df['is_difference_zero_y'] != 1][['model_number','price','new_price','difference']]
# Get the rows where the sum of is_difference_zero_x and is_difference_zero_y is either zero or two