Why does map return a map object instead of a list in Python 3?
Question:
I am interested in understanding the new language design of Python 3.x.
I do enjoy, in Python 2.7, the function map
:
Python 2.7.12
In[2]: map(lambda x: x+1, [1,2,3])
Out[2]: [2, 3, 4]
However, in Python 3.x things have changed:
Python 3.5.1
In[2]: map(lambda x: x+1, [1,2,3])
Out[2]: <map at 0x4218390>
I understand the how, but I could not find a reference to the why. Why did the language designers make this choice, which, in my opinion, introduces a great deal of pain. Was this to arm-wrestle developers in sticking to list comprehensions?
IMO, list can be naturally thought as Functors; and I have been somehow been thought to think in this way:
fmap :: (a -> b) -> f a -> f b
Answers:
Because it returns an iterator, it omit storing the full size list in the memory. So that you can easily iterate over it in the future not making any pain to memory. Possibly you even don’t need a full list, but the part of it, until your condition is reached.
You can find this docs useful, iterators are awesome.
An object representing a stream of data. Repeated calls to the iterator’s __next__()
method (or passing it to the built-in function next()
) return successive items in the stream. When no more data are available a StopIteration
exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__()
method just raise StopIteration
again. Iterators are required to have an __iter__()
method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list
) produces a fresh new iterator each time you pass it to the iter()
function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.
In Python 3 many functions (not just map
but zip
, range
and others) return an iterator rather than the full list. You might want an iterator (e.g. to avoid holding the whole list in memory) or you might want a list (e.g. to be able to index).
However, I think the key reason for the change in Python 3 is that while it is trivial to convert an iterator to a list using list(some_iterator)
the reverse equivalent iter(some_list)
does not achieve the desired outcome because the full list has already been built and held in memory.
For example, in Python 3 list(range(n))
works just fine as there is little cost to building the range
object and then converting it to a list. However, in Python 2 iter(range(n))
does not save any memory because the full list is constructed by range()
before the iterator is built.
Therefore, in Python 2, separate functions are required to create an iterator rather than a list, such as imap
for map
(although they’re not quite equivalent), xrange
for range
, izip
for zip
. By contrast Python 3 just requires a single function as a list()
call creates the full list if required.
I think the reason why map still exists at all when generator expressions also exist, is that it can take multiple iterator arguments that are all looped over and passed into the function:
>>> list(map(min, [1,2,3,4], [0,10,0,10]))
[0,2,0,4]
That’s slightly easier than using zip:
>>> list(min(x, y) for x, y in zip([1,2,3,4], [0,10,0,10]))
Otherwise, it simply doesn’t add anything over generator expressions.
Guido answers this question here: “since creating a list would just be wasteful“.
He also says that the correct transformation is to use a regular for
loop.
Converting map()
from 2 to 3 might not just be a simple case of sticking a list( )
around it. Guido also says:
“If the input sequences are not of equal length, map()
will stop at the termination of the shortest of the sequences. For full compatibility with map()
from Python 2.x, also wrap the sequences in itertools.zip_longest()
, e.g.
map(func, *sequences)
becomes
list(map(func, itertools.zip_longest(*sequences)))
“
I am interested in understanding the new language design of Python 3.x.
I do enjoy, in Python 2.7, the function map
:
Python 2.7.12
In[2]: map(lambda x: x+1, [1,2,3])
Out[2]: [2, 3, 4]
However, in Python 3.x things have changed:
Python 3.5.1
In[2]: map(lambda x: x+1, [1,2,3])
Out[2]: <map at 0x4218390>
I understand the how, but I could not find a reference to the why. Why did the language designers make this choice, which, in my opinion, introduces a great deal of pain. Was this to arm-wrestle developers in sticking to list comprehensions?
IMO, list can be naturally thought as Functors; and I have been somehow been thought to think in this way:
fmap :: (a -> b) -> f a -> f b
Because it returns an iterator, it omit storing the full size list in the memory. So that you can easily iterate over it in the future not making any pain to memory. Possibly you even don’t need a full list, but the part of it, until your condition is reached.
You can find this docs useful, iterators are awesome.
An object representing a stream of data. Repeated calls to the iterator’s
__next__()
method (or passing it to the built-in functionnext()
) return successive items in the stream. When no more data are available aStopIteration
exception is raised instead. At this point, the iterator object is exhausted and any further calls to its__next__()
method just raiseStopIteration
again. Iterators are required to have an__iter__()
method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as alist
) produces a fresh new iterator each time you pass it to theiter()
function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.
In Python 3 many functions (not just map
but zip
, range
and others) return an iterator rather than the full list. You might want an iterator (e.g. to avoid holding the whole list in memory) or you might want a list (e.g. to be able to index).
However, I think the key reason for the change in Python 3 is that while it is trivial to convert an iterator to a list using list(some_iterator)
the reverse equivalent iter(some_list)
does not achieve the desired outcome because the full list has already been built and held in memory.
For example, in Python 3 list(range(n))
works just fine as there is little cost to building the range
object and then converting it to a list. However, in Python 2 iter(range(n))
does not save any memory because the full list is constructed by range()
before the iterator is built.
Therefore, in Python 2, separate functions are required to create an iterator rather than a list, such as imap
for map
(although they’re not quite equivalent), xrange
for range
, izip
for zip
. By contrast Python 3 just requires a single function as a list()
call creates the full list if required.
I think the reason why map still exists at all when generator expressions also exist, is that it can take multiple iterator arguments that are all looped over and passed into the function:
>>> list(map(min, [1,2,3,4], [0,10,0,10]))
[0,2,0,4]
That’s slightly easier than using zip:
>>> list(min(x, y) for x, y in zip([1,2,3,4], [0,10,0,10]))
Otherwise, it simply doesn’t add anything over generator expressions.
Guido answers this question here: “since creating a list would just be wasteful“.
He also says that the correct transformation is to use a regular for
loop.
Converting map()
from 2 to 3 might not just be a simple case of sticking a list( )
around it. Guido also says:
“If the input sequences are not of equal length, map()
will stop at the termination of the shortest of the sequences. For full compatibility with map()
from Python 2.x, also wrap the sequences in itertools.zip_longest()
, e.g.
map(func, *sequences)
becomes
list(map(func, itertools.zip_longest(*sequences)))
“