Pandas drop only duplicates matching a condition

Question

I have a dataframe with duplicates in "Model Number". I want to keep each row that has "Difference" equaling 0.00 and remove its duplicate, but if a duplicate pair does not have a "Difference" equaling 0.00 then I don’t want to remove it.
Original Dataframe

Desired Dataframe

Thank you for your help.

Asked By: Joshua Young

||

Source

Answer 1

Try the following code;

df1 = df[df["Difference"] == 0]
lst_model = [i for i in df["Model Number"].unique() if i not in df1["Model Number"].unique()]
df2 = df[df["Model Number"].isin(lst_model)]
df_final = pd.concat([df1,df2])

Answered By: Sachin Kohli

Answer 2

import pandas as pd
import numpy as np
col_1 = ['401','401','54','54','55A','55A','248','248']
col_2 = [14.97,14.97,6.44, 6.44,12.47,12.47,17.66,17.66]
col_3 = [14.97,7.94 ,6.44,17.89,19.89,8.47,17.66,14.51]
col_4 = np.array(col_2) - np.array(col_3)

df = pd.DataFrame({'model_number':col_1,'price':col_2,'new_price':col_3,'difference':col_4}) # create the dataframe
df['is_difference_zero'] = df['difference'] == 0 # Get all columns where difference is zero
df = pd.merge(df,df.groupby('model_number').agg({'is_difference_zero':'sum'})['is_difference_zero'],on='model_number')
    # If difference not zero for both models then the sum of is_difference_zero grouped by model_number is two
df_final = df[df['is_difference_zero_x'] + df['is_difference_zero_y'] != 1][['model_number','price','new_price','difference']]
    # Get the rows where the sum of is_difference_zero_x and is_difference_zero_y is either zero or two

Answered By: Christensen Daniel

Pandas drop only duplicates matching a condition

Question:

Answers: