pandas how to get all rows with specific count of values

Question:

I have a dataframe

df = 

    C1 C2
    a.  2
    d.  8  
    d.  5  
    d.  5  
    b.  3
    b.  4
    c.  5
    a.  6
    b.  7

I want to take all the rows, in which the count of the value in C1 is <= 2, and add a new col that is low, and keep the original value otherwise. So the new df will look like that:

df_new = 
C1 C2 type
a.  2  low
d.  8  d
d.  5  d
d.  5  d
b.  3. b
b.  4  b
c.  5. low
a.  6. low
b.  7  b

How can I do this?

I also want to get back a list of all the types that were low ([‘a’,’c’] here)

Thanks

Asked By: Cranjis

||

Answers:

You can use pandas.DataFrame.groupby and count the value of 'C1' in each group. Then use lambda in pandas.DataFrame.transform and return low or the original value of the group. Or we can use numpy.where on the result of groupby.

df['type'] = df.groupby('C1')['C1'].transform(lambda g: 'low' if len(g)<=2 else g.iloc[0][:-1])

# Or we can use 'numpy.where' on the result of groupby
g = df.groupby('C1')['C1'].transform('size')
df['type'] = np.where(g<=2, 'low', df['C1'].str[:-1])
print(df)

Output:

   C1  C2 type
0  a.   2  low
1  d.   8    d
2  d.   5    d
3  d.   5    d
4  b.   3    b
5  b.   4    b
6  c.   5  low
7  a.   6  low
8  b.   7    b
Answered By: I'mahdi
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.