Fastest way of filter index values based on list values from multiple columns in Pandas Dataframe?

Question

I have a data frame like below. It has two columns column1,column2 from these two columns I want to filter few values(combinations of two lists) and get the index. Though I wrote the logic for it. It is will be too slow for filtering from a larger data frame. Is there any faster way to filter the data and get the list of indexes?

Data frame:-

import pandas as pd
d = {'col1': [11, 20,90,80,30], 'col2': [30, 40,50,60,90]}
df = pd.DataFrame(data=d)
print(df)
   col1 col2
0   11  30
1   20  40
2   90  50
3   80  60
4   30  90

l1=[11,90,30]
l2=[30,50,90]
final_result=[]
for i,j in zip(l1,l2):
    res=df[(df['col1']==i) & (df['col2']==j)]
    final_result.append(res.index[0])
print(final_result)

[0, 2, 4]

Asked By: data en

||

Source

Answer 1

here is one way to do it. Merging the two DF and filtering where value exists in both DF

# create a DF of the list you like to match with
df2=pd.DataFrame({'col1': l1, 'col2': l2})

# merge the two DF
df3=df.merge(df2, how='left',
        on=['col1', 'col2'], indicator='foundIn')

# filterout rows that are in both
out=df3[df3['foundIn'].eq('both')].index.to_list()
out

[0, 2, 4]

Answered By: Naveed

Answer 2

you can use:

condition_1=df['col1'].astype(str).str.contains('|'.join(map(str, l1)))
condition_2=df['col2'].astype(str).str.contains('|'.join(map(str, l2)))
final_result=df.loc[ condition_1 & condition_2 ].index.to_list()

Answered By: Clegane

Answer 3

You can just use underlying numpy array and create the boolean indexing:

mask=(df[['col1', 'col2']].values[:,None]==np.vstack([l1,l2]).T).all(-1).any(1)
# mask
# array([ True, False,  True, False,  True])

df.index[mask]
# prints
# Int64Index([0, 2, 4], dtype='int64')

Answered By: ThePyGuy

Fastest way of filter index values based on list values from multiple columns in Pandas Dataframe?

Question:

Answers: