How to filter a dataframe column having multiple values in Python

Question:

I have a data frame that sometimes has multiple values in cells like this:

df:
Fruits
apple, pineapple, mango
guava, blueberry, apple
custard-apple, cranberry
banana, kiwi, peach
apple

Now, I want to filter the data frame having an apple in the value.
So my output should look like this:

Fruits
apple, pineapple, mango
guava, blueberry, apple
apple

I used the str.contains(‘apple’) but this is not returning the ideal result.

Can anyone help me with how I can get this result?

Asked By: Yash

||

Answers:

You can use .query with .contains

import pandas as pd


data = {
    "Fruits": ["apple, pineapple, mango", "guava, blueberry, apple", "custard-apple, cranberry",
               "banana, kiwi, peach", "apple"]
}

df = pd.DataFrame(data)
df = df.query("Fruits.str.contains('apple') & ~Fruits.str.contains('-apple')").reset_index(drop=True)
print(df)

                    Fruits
0  apple, pineapple, mango
1  guava, blueberry, apple
2                    apple
Answered By: snake_charmer_775

Here you go,

apple = df[df.values == "apple"] 
print("The df with apple:", apple)

enter image description here

enter image description here

Answered By: Vivek Menon M

You can split the data by ,, explode them, then compare with apple:

mask = df['Fruits'].str.split(', ').explode().eq('apple').groupby(level=0).any()
df[mask]

Output:

                    Fruits
0  apple, pineapple, mango
1  guava, blueberry, apple
4                    apple
Answered By: Quang Hoang