Dropping rows that fall below a certain percentage threshold of the total rows/sum [Python]
Question:
I am having an issue with filtering out the crimes – "OffenseDescription" – that fall below 5% (the specific or general solution would help so I can reproduce/adjust requirements as needed) of the total rows in the dataframe.
This is what I’ve tried so far, but it is crashing the kernel and is essentially running an infinite loop/execution.
I’m also doing this in VS Code, via a Jupyter Notebook.
This is the code I’ve attempted so far:
tot=crime.OffenseDescription.sum() #Find sum of column
crime[crime.groupby(['OffenseDescriptiom']).transform(lambda x:
(x.div(tot)*100)<0.05)] #calculate percentage filter as per
condition
Link to a screenshot of .head() of the dataframe I am using:

TIA
Answers:
Use Series.value_counts
with normalize for percentages and for remove groups bellow 0.05
filter mapped column greater or equal 0.05
in boolean indexing
:
percentage = crime.OffenseDescription.value_counts(normalize=True)
crime[crime['OffenseDescriptiom'].map(percentage) >= 0.05)]
I am having an issue with filtering out the crimes – "OffenseDescription" – that fall below 5% (the specific or general solution would help so I can reproduce/adjust requirements as needed) of the total rows in the dataframe.
This is what I’ve tried so far, but it is crashing the kernel and is essentially running an infinite loop/execution.
I’m also doing this in VS Code, via a Jupyter Notebook.
This is the code I’ve attempted so far:
tot=crime.OffenseDescription.sum() #Find sum of column
crime[crime.groupby(['OffenseDescriptiom']).transform(lambda x:
(x.div(tot)*100)<0.05)] #calculate percentage filter as per
condition
Link to a screenshot of .head() of the dataframe I am using:
TIA
Use Series.value_counts
with normalize for percentages and for remove groups bellow 0.05
filter mapped column greater or equal 0.05
in boolean indexing
:
percentage = crime.OffenseDescription.value_counts(normalize=True)
crime[crime['OffenseDescriptiom'].map(percentage) >= 0.05)]