pandas – filter rows with same value in many columns
Question:
I have a pandas DataFrame with many columns (around 100+ columns but the exact amount doesn’t matter).
Most rows have the same value in all columns, some rows have more than one unique value.
For example, in the following table, rows 1
and 2
have the same value in all columns and row 3
has more than value in all columns.
column 1
column 2
column 3
column 4
…
column n
A
A
A
A
…
A
A
A
A
A
…
A
C
A
B
A
…
A
I want to filter rows which have only 1 unique value in all of its columns. In the previous example I would only keep row 3
.
I know how to filter rows based on values in specific columns using masks, but this doesn’t seem to work in the case.
Any Ideas?
Answers:
Looks like you want to filter based on nunique
with boolean indexing:
out = df[df.nunique(axis=1).ne(1)]
Output:
column 1 column 2 column 3 column 4 column n
2 C A B A A
Intermediate:
df.nunique(axis=1).ne(1)
0 False
1 False
2 True
dtype: bool
I have a pandas DataFrame with many columns (around 100+ columns but the exact amount doesn’t matter).
Most rows have the same value in all columns, some rows have more than one unique value.
For example, in the following table, rows 1
and 2
have the same value in all columns and row 3
has more than value in all columns.
column 1 | column 2 | column 3 | column 4 | … | column n |
---|---|---|---|---|---|
A | A | A | A | … | A |
A | A | A | A | … | A |
C | A | B | A | … | A |
I want to filter rows which have only 1 unique value in all of its columns. In the previous example I would only keep row 3
.
I know how to filter rows based on values in specific columns using masks, but this doesn’t seem to work in the case.
Any Ideas?
Looks like you want to filter based on nunique
with boolean indexing:
out = df[df.nunique(axis=1).ne(1)]
Output:
column 1 column 2 column 3 column 4 column n
2 C A B A A
Intermediate:
df.nunique(axis=1).ne(1)
0 False
1 False
2 True
dtype: bool