How to perform filter map reduce equivalent in Pyhon?

Question:

Let’s assume there are two lists like:

list1 = ["num", "categ"]
all_names = ["col_num1", "col_num2", "col_num3", "col_categ1", "col_categ2", "col_bol1", "col_bol2", "num_extra_1", "num_extra_2", "categ_extra_1", "categ_extra_2"]

I am trying to create a new list by filtering the elements that 1) not contain "extra" and 2) contains the elements of list1.

For example, Here is I expect to get something like this:

l=["col_num1", "col_num2", "col_num3", "col_categ1", "col_categ2"]

In Pyspark this can be done using filter, map and reduce, but not sure what is the equivalent in Python? For now, I am doing this in two steps like below, but I think there might be a more straightforward way of doing this.

temp_list = [a for a in all_names if "extra" not in a]
print(temp_list)
['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2', 'col_bol1', 'col_bol2']

l = [b for b in temp_list for c in list1 if c in b]
print(l)
['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']
Asked By: armin

||

Answers:

you can use something like this:

l = list(filter(lambda x: "extra" not in x and any(c in x for c in list1), all_names))
print(l)

output would be:

['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']
Answered By: Phoenix

You got the first part right.

Next, you want to include only those elements of temp_list that contain any of the elements in list1.

result = [b for b in temp_list if any(c in b for c in list1)]

which gives:

['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']

Now that you understand the two steps involved here, you can combine both these steps in one instead of creating an intermediate list. Since you want both conditions to be true, use a boolean and:

result = [a for a in all_names 
            if "extra" not in a 
            and any(c in a for c in list1)]

Note: The nested loop you have in for b in temp_list for c in list1 isn’t quite right here, because you only want to select an item once, even if it contains both the elements in list1. Consider, for example:

list1 = ["num", "categ"]
all_names = ["col_num1", "col_categ1", "col_num2_categ2", "categ_extra_1"]

# your code here
temp_list = [a for a in all_names if "extra" not in a]
l = [b for b in temp_list for c in list1 if c in b]

would give a l that contains "col_num2_categ2" two times, because the condition if c in b is true for two values of c in list1 when b = 'col_num2_categ2':

['col_num1', 'col_categ1', 'col_num2_categ2', 'col_num2_categ2']
Answered By: Pranav Hosangadi

Nested list comprehension as an alternative:

list1 = ["num", "categ"]
all_names = ["col_num1", "col_num2", "col_num3", "col_categ1", "col_categ2", "col_bol1", "col_bol2", "num_extra_1", "num_extra_2", "categ_extra_1", "categ_extra_2"]

result = [a for a in all_names for c in list1 if 'extra' not in a and c in a]

print(result)

# ['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']
Answered By: Arifa Chan
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.