Python fluent filter, map, etc

Question:

I love python. However, one thing that bugs me a bit is that I don’t know how to format functional activities in a fluid manner like a can in javascript.

example (randomly created on the spot): Can you help me convert this to python in a fluent looking manner?

var even_set = [1,2,3,4,5]
.filter(function(x){return x%2 === 0;})
.map(function(x){
    console.log(x); // prints it for fun
    return x;
})
.reduce(function(num_set, val) {
    num_set[val] = true;
}, {});

I’d like to know if there are fluid options? Maybe a library.

In general, I’ve been using list comprehensions for most things but it’s a real problem if I want to print

e.g., How can I print every even number between 1 – 5 in python 2.x using list comprehension (Python 3 print() as a function but Python 2 it doesn’t). It’s also a bit annoying that a list is constructed and returned. I’d rather just for loop.

Asked By: ThinkBonobo

||

Answers:

The biggest dealbreaker to the code you wrote is that Python doesn’t support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you’ll either have to define the functions ahead of time, or use a lambda.

Answered By: szxk

Arguments against doing this notwithstanding, here is a translation into Python of your JS code.

from __future__ import print_function
from functools import reduce

def print_and_return(x):
    print(x)
    return x

def isodd(x):
    return x % 2 == 0

def add_to_dict(d, x):
    d[x] = True
    return d

even_set = list(reduce(add_to_dict,
                map(print_and_return,
                filter(isodd, [1, 2, 3, 4, 5])), {}))

It should work on both Python 2 and Python 3.

Comprehensions are the fluent python way of handling filter/map operations.

Your code would be something like:

def evenize(input_list):
    return [x for x in input_list if x % 2 == 0]

Comprehensions don’t work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn’t really that common an idiom in python. Don’t expect that to be your bread and butter here. Python libraries tend to follow the “alter state or return a value, but not both” pattern. Some exceptions exist.

Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:

List comprehension: [x for x in range(3)] == [0, 1, 2]

Set comprehension: {x for x in range(3)} == {0, 1, 2}

Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}

Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>

With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.

For instance, if you try to do the following, even with python3 semantics for range:

for number in [x**2 for x in range(10000000000000000)]:
    print(number)

you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:

for number in (x**2 for x in range(1e20)):
    print(number)

and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls

GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.

(Note the GENEXP_ITERATOR object is not visible at the code level)

next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.

This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.

def pipeline(filenames):
    basepath = path.path('/usr/share/stories')
    fullpaths = (basepath / fn for fn in filenames)
    realfiles = (fn for fn in fullpaths if os.path.exists(fn))
    openfiles = (open(fn) for fn in realfiles)
    def read_and_close(file):
        output = file.read(100)
        file.close()
        return output
    prefixes = (read_and_close(file) for file in openfiles)
    noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
    return {prefix[:32]: prefix for prefix in prefixes}

At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.

Answered By: jcdyer

Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.

For example in this case, take care of the logging with a helper function:

def echo(x):
    print(x)
    return x

Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:

In [118]: d={echo(x):True for x in s if x%2==0}
2
4

In [119]: d
Out[119]: {2: True, 4: True}

or to add these values to an existing dictionary, use update.

new_set.update({echo(x):True for x in s if x%2==0})

another way to write this is with an intermediate generator:

{y:True for y in (echo(x) for x in s if x%2==0)}

Or combine the echo and filter in one generator

def even(s):
    for x in s:
        if x%2==0:
            print(x)
            yield(x)

followed by a dict comp using it:

{y:True for y in even(s)}
Answered By: hpaulj

Update Here’s yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:

from infixpy import *
a = (Seq(range(1,51))
     .map(lambda x: x * 4)
     .filter(lambda x: x <= 170)
     .filter(lambda x: len(str(x)) == 2)
     .filter( lambda x: x % 20 ==0)
     .enumerate()                                            Ï
     .map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
     .mkstring(' .. '))
print(a)

pip3 install infixpy

Older

I am looking now at an answer that strikes closer to the heart of the question:

fluentpy https://pypi.org/project/fluentpy/ :

Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:

import fluentpy as _
(
  _(range(1,50+1))
  .map(_.each * 4)
  .filter(_.each <= 170)
  .filter(lambda each: len(str(each))==2)
  .filter(lambda each: each % 20 == 0)
  .enumerate()
  .map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
  .join(',')
  .print()
)

And it works fine:

Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80

I am just now trying this out. It will be a very good day today if this were working as it is shown above.

Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:

python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"

Here is it in action on command line:

$echo -e "Hello World line1nLine 2Line 3nGoodbye" 
         | python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"

hello world line1

line 2

line 3

goodbye

There is an extra newline that should be cleaned up – but the gist of it is useful (to me anyways).

Answered By: WestCoastProjects

There’s a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it’s written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It’s named pyxtension and it’s prod ready and maintained on PyPi.
Your code would be rewritten in this form:

from pyxtension.strams import stream
def console_log(x):
    print(x)
    return x
even_set = stream([1,2,3,4,5])
    .filter(lambda x:x%2 === 0)
    .map(console_log)
    .reduce(lambda num_set, val: num_set.__setitem__(val,True))

Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.

Answered By: asu

We can use Pyterator for this (disclaimer: I am the author).

We define the function that prints and returns (which I believe you can omit completely however).

def print_and_return(x):
    print(x)
    return x

then

from pyterator import iterate

even_dict = (
    iterate([1,2,3,4,5])
    .filter(lambda x: x%2==0)
    .map(print_and_return)
    .map(lambda x: (x, True))
    .to_dict()
)
# {2: True, 4: True}

where I have converted your reduce into a sequence of tuples that can be converted into a dictionary.

Answered By: remykarem