Removing rows based on the combined value of other rows

Question:

I want to remove the row with "SubAggregate"=’All’ if the rows with the same "Month" and "MainAggregate" sums ("ValueTraded") to the same as the corresponding "SubAggregate"=’All’ value

My idea was to group by "MainAggregate" and "Month" and if the value was equal to two times the value in the row with an ‘All’, then it would be deleted. I only got to the point where I grouped the data

Tester = data.groupby(["Month", "MainAggregate"], as_index=False)["ValueTraded"].sum()
Tester["ValueTraded"] = Tester["ValueTraded"] / 2

Below is an example of the data and the desired output:

  • DATA
    Data
  • Desired Output
    Desired Output
Asked By: Marc225

||

Answers:

You can compare sum per groups in new columns in GroupBy.transform with first All value by GroupBy.first and because duplicates remove only first duplicated value by add DataFrame.duplicated:

mask = df['subaAggregate'].eq('All')

s = df.groupby(["Month", "MainAggregate"])["valueTrad"].transform('sum').div(2)


s1 = (df.assign(new = df['valueTrad'].where(mask))
        .groupby(["Month", "MainAggregate"])["new"].transform('first'))

out = df[s.ne(s1) | ~mask | df.duplicated(['Month','MainAggregate','subaAggregate'])]

Or compare filtred rows with only All in boolean indexing with remove duplicates by DataFrame.drop_duplicates for indices for remove from original DataFrame:

#necessary unique indices
#df = df.reset_index(drop=True)    
mask = df['subaAggregate'].eq('All')

s = df.groupby(["Month", "MainAggregate"])["valueTrad"].sum().div(2).rename('new')
df1 = df[mask].drop_duplicates().join(s, on=['Month','MainAggregate'])

out = df.drop(df1.index[df1['valueTrad'].eq(df1['new'])])
Answered By: jezrael