filtering pandas df when value is in a list
Question:
lets say I have the following pandas dataframe. I would like to filter the row, where for example the name is Julia. But how can I filter if the names are in a list?
| Index | Department | Names |
|-------|------------| --------------------|
| 0 | History |[Carl, Julia, Jens] |
| 1 | Economics |[Maggie, Lisa, Jules]|
| 2 | Physics | [Freddy, Mark, Rolf]|
I have tried with df.contains()
but I receive a error message.
Any ideas?
Answers:
You can try:
df.loc["Julia" not in df.Names,:];
Hope it solves the problem.
you could try converting the list into a string with join and then use contains():
df = pd.DataFrame([
['History',['Carl', 'Julia', 'Jens']],
['Economics',['Maggie', 'Lisa', 'Jules']],
['Pysics',['Freddy', 'Mark', 'Rolf']],
])
df[df[1].str.join(',').str.contains('Julia')]
0 1
0 History [Carl, Julia, Jens]
You need to iterate the DataFrame to find the list item on each row:
for index, row in df.iterrows():
if 'Julia' in row['Names']:
print(row)
Department History
Names [Carl, Julia, Jens]
Name: 0, dtype: object
You can use apply
to create a mask and filter the dataframe:
>>> df = pd.DataFrame(
[['History',['Carl', 'Julia', 'Jens']],
['Economics',['Maggie', 'Lisa', 'Jules']],
['Pysics',['Freddy', 'Mark', 'Rolf']]],
columns=["Department", "Names"]
)
>>> mask = df["Names"].apply(lambda x: "Julia" not in x)
>>> df[mask]
Department Names
1 Economics [Maggie, Lisa, Jules]
2 Pysics [Freddy, Mark, Rolf]
This should also be faster than solutions using iterrows
. As a comparison I used:
filtered_df = df.copy()
for index, row in filtered_df.iterrows():
if 'Julia' in row['Names']:
filtered_df.drop(index=index, inplace=True)
filtered_df
When measured with %%timeit
in a Jupyter notebook, the version with iterrows
takes almost twice as much the time of the version using apply
.
lets say I have the following pandas dataframe. I would like to filter the row, where for example the name is Julia. But how can I filter if the names are in a list?
| Index | Department | Names |
|-------|------------| --------------------|
| 0 | History |[Carl, Julia, Jens] |
| 1 | Economics |[Maggie, Lisa, Jules]|
| 2 | Physics | [Freddy, Mark, Rolf]|
I have tried with df.contains()
but I receive a error message.
Any ideas?
You can try:
df.loc["Julia" not in df.Names,:];
Hope it solves the problem.
you could try converting the list into a string with join and then use contains():
df = pd.DataFrame([
['History',['Carl', 'Julia', 'Jens']],
['Economics',['Maggie', 'Lisa', 'Jules']],
['Pysics',['Freddy', 'Mark', 'Rolf']],
])
df[df[1].str.join(',').str.contains('Julia')]
0 1
0 History [Carl, Julia, Jens]
You need to iterate the DataFrame to find the list item on each row:
for index, row in df.iterrows():
if 'Julia' in row['Names']:
print(row)
Department History
Names [Carl, Julia, Jens]
Name: 0, dtype: object
You can use apply
to create a mask and filter the dataframe:
>>> df = pd.DataFrame(
[['History',['Carl', 'Julia', 'Jens']],
['Economics',['Maggie', 'Lisa', 'Jules']],
['Pysics',['Freddy', 'Mark', 'Rolf']]],
columns=["Department", "Names"]
)
>>> mask = df["Names"].apply(lambda x: "Julia" not in x)
>>> df[mask]
Department Names
1 Economics [Maggie, Lisa, Jules]
2 Pysics [Freddy, Mark, Rolf]
This should also be faster than solutions using iterrows
. As a comparison I used:
filtered_df = df.copy()
for index, row in filtered_df.iterrows():
if 'Julia' in row['Names']:
filtered_df.drop(index=index, inplace=True)
filtered_df
When measured with %%timeit
in a Jupyter notebook, the version with iterrows
takes almost twice as much the time of the version using apply
.