Dropping rows that fall below a certain percentage threshold of the total rows/sum [Python]


I am having an issue with filtering out the crimes – "OffenseDescription" – that fall below 5% (the specific or general solution would help so I can reproduce/adjust requirements as needed) of the total rows in the dataframe.

This is what I’ve tried so far, but it is crashing the kernel and is essentially running an infinite loop/execution.

I’m also doing this in VS Code, via a Jupyter Notebook.

This is the code I’ve attempted so far:

  tot=crime.OffenseDescription.sum()  #Find sum of column 
  crime[crime.groupby(['OffenseDescriptiom']).transform(lambda x:
  (x.div(tot)*100)<0.05)]   #calculate percentage filter as per

Link to a screenshot of .head() of the dataframe I am using:



Asked By: Fixer



Use Series.value_counts with normalize for percentages and for remove groups bellow 0.05 filter mapped column greater or equal 0.05 in boolean indexing:

percentage = crime.OffenseDescription.value_counts(normalize=True) 

crime[crime['OffenseDescriptiom'].map(percentage) >= 0.05)] 
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.