How to filtering pandas dataframe by multiple columns
Question:
I would like to get values from column n
where values in subset of other columns is True.
Example, the data frame:
t, f = True, False
data = [
[t, f, f, '1'],
[f, f, f, '2'],
[f, t, f, '3'],
[f, f, t, '4']
]
df = pd.DataFrame(data, columns=list("abcn"))
df as table
a b c n
0 True False False 1
1 False False False 2
2 False True False 3
3 False False True 4
columns for search is a
and b
, and I wish to get records from n
where these columns are True
, what I tried:
fcols = ("a", "b")
df[df[[*fcols]] == t].dropna(axis=0, how='all')
this is give me right records, but with Nan
in the column n
a b c n
0 True NaN NaN NaN
2 NaN True NaN NaN
I’m feel that I’m more or less close to the solution, but …
Answers:
Use any
to aggregate the booleans for your boolean indexing:
fcols = ("a", "b")
out = df[df[[*fcols]].eq(t).any(axis=1)]#.dropna(axis=0, how='all') # dropna not needed
Output:
a b c n
0 True False False 1
2 False True False 3
Intermediate indexing Series:
df[[*fcols]].eq(t).any(axis=1)
0 True
1 False
2 True
3 False
dtype: bool
Use DataFrame.any
for test at least one True
match per rows for boolean Series passed to boolean indexing
:
fcols = ("a", "b")
df = df[df[[*fcols]].eq(t).any(axis=1)]
#if need test Trues, possible remove compare by True
df = df[df[[*fcols]].any(axis=1)]
print (df)
a b c n
0 True False False 1
2 False True False 3
Details:
print (df[[*fcols]].eq(t).any(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
I decided in this way
df = df[df['a'] | df['b']]
In [5]: df
Out[5]:
a b c n
0 True False False 1
2 False True False 3
I would like to get values from column n
where values in subset of other columns is True.
Example, the data frame:
t, f = True, False
data = [
[t, f, f, '1'],
[f, f, f, '2'],
[f, t, f, '3'],
[f, f, t, '4']
]
df = pd.DataFrame(data, columns=list("abcn"))
df as table
a b c n
0 True False False 1
1 False False False 2
2 False True False 3
3 False False True 4
columns for search is a
and b
, and I wish to get records from n
where these columns are True
, what I tried:
fcols = ("a", "b")
df[df[[*fcols]] == t].dropna(axis=0, how='all')
this is give me right records, but with Nan
in the column n
a b c n
0 True NaN NaN NaN
2 NaN True NaN NaN
I’m feel that I’m more or less close to the solution, but …
Use any
to aggregate the booleans for your boolean indexing:
fcols = ("a", "b")
out = df[df[[*fcols]].eq(t).any(axis=1)]#.dropna(axis=0, how='all') # dropna not needed
Output:
a b c n
0 True False False 1
2 False True False 3
Intermediate indexing Series:
df[[*fcols]].eq(t).any(axis=1)
0 True
1 False
2 True
3 False
dtype: bool
Use DataFrame.any
for test at least one True
match per rows for boolean Series passed to boolean indexing
:
fcols = ("a", "b")
df = df[df[[*fcols]].eq(t).any(axis=1)]
#if need test Trues, possible remove compare by True
df = df[df[[*fcols]].any(axis=1)]
print (df)
a b c n
0 True False False 1
2 False True False 3
Details:
print (df[[*fcols]].eq(t).any(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
I decided in this way
df = df[df['a'] | df['b']]
In [5]: df
Out[5]:
a b c n
0 True False False 1
2 False True False 3