Removing rows based on the combined value of other rows
Question:
I want to remove the row with "SubAggregate"=’All’ if the rows with the same "Month" and "MainAggregate" sums ("ValueTraded") to the same as the corresponding "SubAggregate"=’All’ value
My idea was to group by "MainAggregate" and "Month" and if the value was equal to two times the value in the row with an ‘All’, then it would be deleted. I only got to the point where I grouped the data
Tester = data.groupby(["Month", "MainAggregate"], as_index=False)["ValueTraded"].sum()
Tester["ValueTraded"] = Tester["ValueTraded"] / 2
Below is an example of the data and the desired output:
Answers:
You can compare sum
per groups in new columns in GroupBy.transform
with first All
value by GroupBy.first
and because duplicates remove only first duplicated value by add DataFrame.duplicated
:
mask = df['subaAggregate'].eq('All')
s = df.groupby(["Month", "MainAggregate"])["valueTrad"].transform('sum').div(2)
s1 = (df.assign(new = df['valueTrad'].where(mask))
.groupby(["Month", "MainAggregate"])["new"].transform('first'))
out = df[s.ne(s1) | ~mask | df.duplicated(['Month','MainAggregate','subaAggregate'])]
Or compare filtred rows with only All
in boolean indexing
with remove duplicates by DataFrame.drop_duplicates
for indices for remove from original DataFrame
:
#necessary unique indices
#df = df.reset_index(drop=True)
mask = df['subaAggregate'].eq('All')
s = df.groupby(["Month", "MainAggregate"])["valueTrad"].sum().div(2).rename('new')
df1 = df[mask].drop_duplicates().join(s, on=['Month','MainAggregate'])
out = df.drop(df1.index[df1['valueTrad'].eq(df1['new'])])
I want to remove the row with "SubAggregate"=’All’ if the rows with the same "Month" and "MainAggregate" sums ("ValueTraded") to the same as the corresponding "SubAggregate"=’All’ value
My idea was to group by "MainAggregate" and "Month" and if the value was equal to two times the value in the row with an ‘All’, then it would be deleted. I only got to the point where I grouped the data
Tester = data.groupby(["Month", "MainAggregate"], as_index=False)["ValueTraded"].sum()
Tester["ValueTraded"] = Tester["ValueTraded"] / 2
Below is an example of the data and the desired output:
You can compare sum
per groups in new columns in GroupBy.transform
with first All
value by GroupBy.first
and because duplicates remove only first duplicated value by add DataFrame.duplicated
:
mask = df['subaAggregate'].eq('All')
s = df.groupby(["Month", "MainAggregate"])["valueTrad"].transform('sum').div(2)
s1 = (df.assign(new = df['valueTrad'].where(mask))
.groupby(["Month", "MainAggregate"])["new"].transform('first'))
out = df[s.ne(s1) | ~mask | df.duplicated(['Month','MainAggregate','subaAggregate'])]
Or compare filtred rows with only All
in boolean indexing
with remove duplicates by DataFrame.drop_duplicates
for indices for remove from original DataFrame
:
#necessary unique indices
#df = df.reset_index(drop=True)
mask = df['subaAggregate'].eq('All')
s = df.groupby(["Month", "MainAggregate"])["valueTrad"].sum().div(2).rename('new')
df1 = df[mask].drop_duplicates().join(s, on=['Month','MainAggregate'])
out = df.drop(df1.index[df1['valueTrad'].eq(df1['new'])])