Drop pandas rows based on percentage of valid data

Question:

I have a pandas data frame that looks like this

Date_Time level
2018-02-12 13:22:27 5
2018-02-12 13:17:27 7
2018-02-12 13:12:27 2
2018-02-12 13:07:27 6
2018-02-13 13:12:27 4
2018-02-13 13:17:27 5

How do I make it so If there is less than 3 entries on a specific date they get removed
i.e since 2018-03-13 < 4 entries remove them and get this table

Date_Time level
2018-02-12 13:22:27 5
2018-02-12 13:17:27 7
2018-02-12 13:12:27 2
2018-02-12 13:07:27 6

I tried using a for loop but that takes too long to run

Asked By: Shawn Mian

||

Answers:

You can do groupby and transform with count and then use ge to get the rows you wanted:

df[df.groupby(df['Date_Time'].dt.date)['Date_Time'].transform('count').ge(4)]
Answered By: SomeDude
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.