How to filter dataframe by columns values combinations?

Question:

I have a dataframe:

col1   col2   col3
a       b       b
a       b       c
k       l       o
b       l       b

I want to keep only rows where col1 is "a", col2 is "b" and col3 is "b" or col1 is "k", col2 is "l" and col3 is "o". So desired result is:

col1   col2   col3
a       b       b
k       l       o

How to do that? i can write dt[(dt["col1"]=="a")&(dt["col2"]=="b")&(dt["col1"]=="b")] but what about second case? should i put it with or?

Asked By: gh1222

||

Answers:

abb = (df["col1"]=="a") & (df["col2"]=="b") & (df["col3"]=="b")
klo = (df["col1"]=="k") & (df["col2"]=="l") & (df["col3"]=="o")

df[(abb) | (klo)]

  col1 col2 col3
0    a    b    b
2    k    l    o

Alternatively, you could write something like this, just to avoid all those conditionals:

abb = 'abb'
klo = 'klo'

strings = [abb, klo]

def f(x):
    if ''.join(x) in strings:
        return True
    return False

df[df.apply(lambda x: f(x), axis=1)]

  col1 col2 col3
0    a    b    b
2    k    l    o

So, here we are applying a custom function to each row with df.apply. Inside the function we turn the row into a single string with str.join and we check if this string exists in our predefined list of strings. Finally, we use the resulting pd.Series with booleans to select from our df.

Answered By: ouroboros1

you can create a check_list then use pandas.apply with axis=1 and check in specific columns any list of check exist or not.

col = ['col1','col2','col3']
chk = [['a','b','b'],['k','l','o']]
msk = df.apply(lambda row : any((row[col]==c).all() for c in chk), axis=1)
df = df[msk]
print(df)

  col1 col2 col3
0    a    b    b
2    k    l    o
Answered By: I'mahdi