How to preform loc with one condition that include two columns

Question:

I have df with two columns A and B both of them are columns with string values.

Example:

df_1 = pd.DataFrame(data={
    "A":['a','b','c'],
    "B":['a x d','z y w','q m c'] #string values not a list
})
print(df_1)

#output
   A      B
0  a  a x d
1  b  z y w
2  c  q m c

now what I’m trying to do is to preform loc in the df_1 to get all the row that col B cointain the string value in col A.

In this example the output i want is the first and the third rows:

   A      B
0  a  a x d # 'a x d' contain value 'a'
2  c  q m c # 'q m c' contain value 'c'

I have tried different loc condition but got unhashable type: ‘Series’ error:

df_1.loc[df_1["B"].str.contains(df_1["A"])] #TypeError: unhashable type: 'Series'
df_1.loc[df_1["A"] in df_1["B"]] #TypeError: unhashable type: 'Series'

I really don’t want to use a for/while loop because of the size of the df.

Any idea how can I preform this?

Asked By: AvivSar

||

Answers:

You can try:

out = df_1.loc[df_1.apply(lambda x: x['A'] in x['B'], axis=1)]
print(out)

# Output
   A      B
0  a  a x d
2  c  q m c

Alternative with numpy:

df_1.loc[np.apply_along_axis(lambda x: x[0] in x[1], axis=1, arr=df_1)]
Answered By: Corralien

There is no vectorial method, to map in using two columns. You need to loop here:

mask = [a in b for a,b in zip(df_1['A'], df_1['B'])]

df_1.loc[mask]

Output:

   A      B
0  a  a x d
2  c  q m c
comparison of speed (3000 rows)
# operator.contains
518 µs ± 4.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# list comprehension
554 µs ± 3.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# numpy.apply_along_axis
7.32 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# apply
20.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Answered By: mozway

Cute!

from operator import contains

df_1.loc[map(contains, *map(df_1.get, ['B', 'A']))]

   A      B
0  a  a x d
2  c  q m c

And without the import but with an ugly dunder … /-:

df_1.loc[map(str.__contains__, *map(df_1.get, ['B', 'A']))]

Also, to the OP… don’t accept this answer. I was just golfing. You should never use this in production code as it is less than transparent due to the use of 2 maps and a splat.

And just to be super duper clear…

mozway’s is the better one

Answered By: piRSquared

When accessing two different columns in pandas you can use .apply()

df.apply(lambda row: row['A'] in row['B'], axis = 1)

This creates a Boolean Series which you can use for your loc condition which selects only the columns where ‘A’ is in ‘B’:

df.loc[df.apply(lambda row: row['A'] in row['B'], axis = 1)]
Answered By: Buggy
df_1[df_1.apply(lambda x:x['A'] in x['B'],axis=1)]
Answered By: Y U
import pandas as pd 

df_1 = pd.DataFrame(data={
    "A":['a','b','c'],
    "B":['a x d','z y w','q m c'] #string values not a list
})

m = df_1['B'].str.contains('|'.join(df_1['A']), case=True, regex=True)
print(df_1.loc[m])

#    A      B
# 0  a  a x d
# 2  c  q m c
Answered By: Laurent B.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.