process a sequence in a java.util.stream manner in python

Question:

Does someone know how I would write a sequence processing in a java stream API manner in Python ? The idea is to write the operations in the order they will happen:

myList.stream()
    .filter(condition)
    .map(action1)
    .map(action2)
    .collect(Collectors.toList());

Now in python I could do

[action2(action1(item)) for item in my_list if condition(item)]

But that is the opposite order.

How could I have something in the correct order ? Obviously I could use variables but then I would have to find a name for each partial result.

Asked By: Adrien H

||

Answers:

You could write this yourself:

from collections import UserList


class JavaLike(UserList):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.iter = None

    def stream(self):
        self.iter = None

        return self

    def filter(self, function):
        self.iter = filter(function, self if self.iter is None else self.iter)

        return self

    def map(self, function):
        self.iter = map(function, self if self.iter is None else self.iter)

        return self

    def collect(self, collection_class=None):
        if collection_class is None:
            if self.iter is not None:
                ret = JavaLike(self.iter)
                self.iter = None

                return ret

            return JavaLike(self)

        return collection_class(self if self.iter is None else self.iter)

Then a similar syntax is possible:

>>> JavaLike(range(10)).stream().filter(lambda x: x % 2 == 0).map(str).collect(tuple)
('0', '2', '4', '6', '8')
Answered By: ForceBru

There are at least two modules on PyPI: lazy-streams and pystreams

Answered By: daphshez

There’s a library that already does exactly what you are looking for, i.e. lazy evaluation and the order of operations is the same with how it’s written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It’s named pyxtension, it’s prod ready, covered by unit-tests and maintained on PyPi, and it’s released under MIT license – so you can use it for free in any commercial project.
Your code would be rewritten in this form:

from pyxtension.streams import stream

stream(myList)
    .filter(condition)
    .map(action1)
    .map(action2)
    .toList()

and

stream(myList)
    .filter(condition)
    .mpmap(action1)   # this is for multi-processing map
    .fastmap(action2) # this is multi-threaded map
    .toList()

Note that the last statement toList() does exact what you expect – it collects data as it would happen in a Spark RDD.

Answered By: asu

Wouldn’t reversed() do the trick?

[action2(action1(item)) for item in reversed(my_list) if
condition(item)]

Answered By: vcmsxs
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.