How to get element wise boolean array if array elements of column1 exists in array column2? [Pyspark/ Python]

Question:

My pyspark dataframe has two columns of Array(StringType).

I want to check if items in column1 are present in column2. Based on that I want to get bool column which gives me index-wise 1 or 0 if the item exists.

I tried using np.in1d() and np.isin() but it gives me an error since this is a pyspark dataframe. I have been trying to figure this for quite some time now so any help will be appreciated!

col1 col2 result col
[item1, item2, item3] [item5, item2, item3, item17] [0, 1, 1, 0]
[item3, item5, item6, item9] [item3, item2, item9, item5, item12] [1, 0, 1, 1, 0]
Asked By: user19457514

||

Answers:

You can use the transform and array_contains functions to determine whether an element in col2 appears in col1.

import pyspark.sql.functions as F

...
df = df.withColumn(
    'result col',
    F.transform('col2', lambda x: F.when(F.array_contains('col1', x), 1).otherwise(0))
)
df.show(truncate=False)

# +----------------------------+------------------------------------+---------------+
# |col1                        |col2                                |result col     |
# +----------------------------+------------------------------------+---------------+
# |[item1, item2, item3]       |[item5, item2, item3, item17]       |[0, 1, 1, 0]   |
# |[item3, item5, item6, item9]|[item3, item2, item9, item5, item12]|[1, 0, 1, 1, 0]|
# +----------------------------+------------------------------------+---------------+
Answered By: 过过招