Select only columns that have at most N unique values

Question:

I want to count the number of unique values in each column and select only those columns which have less than 32 unique values.

I tried using
df.filter(nunique<32)
and

df[[ c for df.columns in df if c in c.nunique<32]] 

but because nunique is a method and not function they don’t work. Thought len(set() would work and tried

df.apply(lambda x : len(set(x))

but doesn’t work as well. Any ideas please? thanks in advance!

Asked By: Bharat Ram Ammu

||

Answers:

nunique can be called on the entire DataFrame (you have to call it). You can then filter out columns using loc:

df.loc[:, df.nunique() < 32]

Minimal Verifiable Example

df = pd.DataFrame({'A': list('abbcde'), 'B': list('ababab')})
df
   A  B
0  a  a
1  b  b
2  b  a
3  c  b
4  d  a
5  e  b

df.nunique()
A    5
B    2
dtype: int64

df.loc[:, df.nunique() < 3]
   B
0  a
1  b
2  a
3  b
4  a
5  b
Answered By: cs95

If anyone wants to do it in a method chaining fashion, you can:

df.loc[:, lambda x: x.nunique() < 3]
Answered By: Adrien Pacifico