filtering pandas df when value is in a list

Question:

lets say I have the following pandas dataframe. I would like to filter the row, where for example the name is Julia. But how can I filter if the names are in a list?

| Index | Department | Names               |
|-------|------------| --------------------|
| 0     | History    |[Carl, Julia, Jens]  |
| 1     | Economics  |[Maggie, Lisa, Jules]|
| 2     | Physics    | [Freddy, Mark, Rolf]|

I have tried with df.contains() but I receive a error message.

Any ideas?

Asked By: LightningStack575

||

Answers:

You can try:

df.loc["Julia" not in df.Names,:];

Hope it solves the problem.

Answered By: Lee G

you could try converting the list into a string with join and then use contains():

df = pd.DataFrame([
    ['History',['Carl', 'Julia', 'Jens']],
    ['Economics',['Maggie', 'Lisa', 'Jules']],
    ['Pysics',['Freddy', 'Mark', 'Rolf']],

])
df[df[1].str.join(',').str.contains('Julia')]

    0   1
0   History [Carl, Julia, Jens]
Answered By: misterhuge

You need to iterate the DataFrame to find the list item on each row:

for index, row in df.iterrows():
    if 'Julia' in row['Names']:
        print(row)


Department                History
Names         [Carl, Julia, Jens]
Name: 0, dtype: object
Answered By: cejottax

You can use apply to create a mask and filter the dataframe:

>>> df = pd.DataFrame(
        [['History',['Carl', 'Julia', 'Jens']],
         ['Economics',['Maggie', 'Lisa', 'Jules']],
         ['Pysics',['Freddy', 'Mark', 'Rolf']]],
        columns=["Department", "Names"]
    )
>>> mask = df["Names"].apply(lambda x: "Julia" not in x)
>>> df[mask]
    Department  Names
1   Economics   [Maggie, Lisa, Jules]
2   Pysics      [Freddy, Mark, Rolf]

This should also be faster than solutions using iterrows. As a comparison I used:

filtered_df = df.copy()
for index, row in filtered_df.iterrows():
    if 'Julia' in row['Names']:
        filtered_df.drop(index=index, inplace=True)
filtered_df

When measured with %%timeit in a Jupyter notebook, the version with iterrows takes almost twice as much the time of the version using apply.

Answered By: edd313
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.