Pandas drop row if column value has appeared more than some number of times depending on the value

Question:

I have a DataFrame that looks the following:

t = {1: ['A','B'], 2: ['D','F'], 3: ['A','C'], 4: ['B','E'], 5: [‘B’,’B’], 6: ['D','D'], 7: ['A','H']}
df = pd.DataFrame.from_dict(t,orient='index',columns=['X','Y'])
df

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
5  B  B
6  D  D
7  A  H

I then have a dictionary

d = {‘A’: 2, ‘B’: 1, ‘D’: 4}

What I would like to do is to drop the rows in my dataframe corresponding to the nth occurence of the value in the X column, where n is greater than the integer specified in my dictionary for that particular value, while preserving the order of the rows of my DataFrame. So the result of my operation with the above dictionary should be the DataFrame that looks like

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
6  D  D

whereas with the dictionary

d = {‘A’: 1, ‘B’: 2, ‘D’: 1}

it should look like

   X  Y
1  A  B
2  D  F
4  B  E
5  B  B
Asked By: delgato

||

Answers:

You can use groupby.cumcount to enumerate the rows, then compare to the threshold with a map

mask = df.groupby('X').cumcount() < df['X'].map(d)

df[mask]
Answered By: Quang Hoang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.