filter data based on conditions of another list

Question:

I have a list with names.

name= ["John Lewis", "Michael Armstrong", "Kurt Abela","Brian Watson", "Gregory Dubois"]

name = pd.DataFrame(name)

I have a pandas.DataFrame() called df:

df = pd.DataFrame(
    {'Name':['Karan Singh,John Lewis', 'Michael Armstrong, Fabian Schreiber', 'Roy Dalhuisen', 'Arya Yildirim,Gregory Dubois'],
    'ID':[23,22,21,24]})

Now I would like to filter df, that only names which occured in name to also occur in df after filtering.

I tried this, but it didn’t work:

df = df[~df.index.isin(name.index)
Asked By: Maurice

||

Answers:

You can use apply like this

import pandas as pd
df={'Name':[['Karan Singh,John Lewis'],['Michael Armstrong, Fabian Schreiber'],['Roy Dalhuisen'],['Arya Yildirim,Gregory Dubois'],["hh,bb"]],'ID':[23,22,21,24,28]}
#df to pandas
df = pd.DataFrame(df)

#union all the names and remove , 
all_names=[]
for name in df['Name'].values.tolist():
    all_names.extend(name[0].split(','))

print(all_names)

def filter_names(row):
    names = row.split(',')
    return any(name in names for name in all_names)

df_filtered = df[df['Name'].apply(filter_names)]
print(df_filtered)

Result

                                  Name  ID
0               Karan Singh,John Lewis  23
1  Michael Armstrong, Fabian Schreiber  22
2                        Roy Dalhuisen  21
3         Arya Yildirim,Gregory Dubois  24
4                                hh,bb  28

                                  Name  ID
0               Karan Singh,John Lewis  23
1  Michael Armstrong, Fabian Schreiber  22
3         Arya Yildirim,Gregory Dubois  24
Answered By: Hiren Namera

You can use like this:filtered_df = df[df['Name'].isin(name)]

Answered By: uozcan12

Example

name= ["John Lewis","Michael Armstrong","Kurt Abela","Brian Watson","Gregory Dubois"]
data = {'Name':['Karan Singh,John Lewis','Michael Armstrong, Fabian Schreiber','Roy Dalhuisen','Arya Yildirim,Gregory Dubois'],'ID':[23,22,21,24]}
df = pd.DataFrame(data)

df

    Name                                ID
0   Karan Singh,John Lewis              23
1   Michael Armstrong, Fabian Schreiber 22
2   Roy Dalhuisen                       21
3   Arya Yildirim,Gregory Dubois        24

i don know exact what you want.

Code1

 out = (df.assign(Name=df['Name'].str.split(','))
       .explode('Name')[lambda x: x['Name'].isin(name)])

out

    Name                ID
0   John Lewis          23
1   Michael Armstrong   22
3   Gregory Dubois      24

Code2

out = df[df['Name'].str.contains('|'.join(name))]

out

    Name                                ID
0   Karan Singh,John Lewis              23
1   Michael Armstrong, Fabian Schreiber 22
3   Arya Yildirim,Gregory Dubois        24

Update

name= ["John Lewis", "Michael Armstrong", "Kurt Abela","Brian Watson", "Gregory Dubois"]
name = pd.DataFrame(name)

df = pd.DataFrame(
    {'Name':[['Karan Singh','John Lewis'], ['Michael Armstrong', 'Fabian Schreiber'], ['Roy Dalhuisen'], ['Arya Yildirim', 'Gregory Dubois']],
    'ID':[23,22,21,24]})

name

    0
0   John Lewis
1   Michael Armstrong
2   Kurt Abela
3   Brian Watson
4   Gregory Dubois

df

    Name                                    ID
0   [Karan Singh, John Lewis]               23
1   [Michael Armstrong, Fabian Schreiber]   22
2   [Roy Dalhuisen]                         21
3   [Arya Yildirim, Gregory Dubois]         24

code

out = df[df['Name'].apply(lambda x: name[0].isin(x).sum() > 0)]

out

    Name                                    ID
0   [Karan Singh, John Lewis]               23
1   [Michael Armstrong, Fabian Schreiber]   22
3   [Arya Yildirim, Gregory Dubois]         24
Answered By: Panda Kim
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.