averaging list of lists python column-wise
Question:
I have a list of lists:
something like:
data = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
And I want to average this out like
[average_column_1, average_column_2, average_column_3]
My piece of code is like not very elegant.
It is the naive way of going thru the list, keeping the sum in seperate container and then dividing by number of elements.
I think there is a pythonic way to do this.
Any suggestions?
Thanks
Answers:
Use zip()
, like so:
averages = [sum(col) / float(len(col)) for col in zip(*data)]
zip()
takes multiple iterable arguments, and returns slices of those iterables (as tuples), until one of the iterables cannot return anything more. In effect, it performs a transpose operation, akin to matrices.
>>> data = [[240, 240, 239],
... [250, 249, 237],
... [242, 239, 237],
... [240, 234, 233]]
>>> [list(col) for col in zip(*data)]
[[240, 250, 242, 240],
[240, 249, 239, 234],
[239, 237, 237, 233]]
By performing sum()
on each of those slices, you effectively get the column-wise sum. Simply divide by the length of the column to get the mean.
Side point: In Python 2.x, division on integers floors the decimal by default, which is why float()
is called to “promote” the result to a floating point type.
Pure Python:
from __future__ import division
def mean(a):
return sum(a) / len(a)
a = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
print map(mean, zip(*a))
printing
[243.0, 240.5, 236.5]
NumPy:
a = numpy.array([[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]])
print numpy.mean(a, axis=0)
Python 3:
from statistics import mean
a = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
print(*map(mean, zip(*a)))
data = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
avg = [float(sum(col))/len(col) for col in zip(*data)]
# [243.0, 240.5, 236.5]
This works because zip(*data)
will give you a list with the columns grouped, the float()
call is only necessary on Python 2.x, which uses integer division unless from __future__ import division
is used.
import numpy as np
data = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
np.mean(data, axis=0)
# array([ 243. , 240.5, 236.5])
Seems to work.
You can use map
and zip
:
list(map(lambda x: sum(x)/len(x), zip(*data)))
[243.0, 240.5, 236.5]
I have a list of lists:
something like:
data = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
And I want to average this out like
[average_column_1, average_column_2, average_column_3]
My piece of code is like not very elegant.
It is the naive way of going thru the list, keeping the sum in seperate container and then dividing by number of elements.
I think there is a pythonic way to do this.
Any suggestions?
Thanks
Use zip()
, like so:
averages = [sum(col) / float(len(col)) for col in zip(*data)]
zip()
takes multiple iterable arguments, and returns slices of those iterables (as tuples), until one of the iterables cannot return anything more. In effect, it performs a transpose operation, akin to matrices.
>>> data = [[240, 240, 239],
... [250, 249, 237],
... [242, 239, 237],
... [240, 234, 233]]
>>> [list(col) for col in zip(*data)]
[[240, 250, 242, 240],
[240, 249, 239, 234],
[239, 237, 237, 233]]
By performing sum()
on each of those slices, you effectively get the column-wise sum. Simply divide by the length of the column to get the mean.
Side point: In Python 2.x, division on integers floors the decimal by default, which is why float()
is called to “promote” the result to a floating point type.
Pure Python:
from __future__ import division
def mean(a):
return sum(a) / len(a)
a = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
print map(mean, zip(*a))
printing
[243.0, 240.5, 236.5]
NumPy:
a = numpy.array([[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]])
print numpy.mean(a, axis=0)
Python 3:
from statistics import mean
a = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
print(*map(mean, zip(*a)))
data = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
avg = [float(sum(col))/len(col) for col in zip(*data)]
# [243.0, 240.5, 236.5]
This works because zip(*data)
will give you a list with the columns grouped, the float()
call is only necessary on Python 2.x, which uses integer division unless from __future__ import division
is used.
import numpy as np
data = [[240, 240, 239],
[250, 249, 237],
[242, 239, 237],
[240, 234, 233]]
np.mean(data, axis=0)
# array([ 243. , 240.5, 236.5])
Seems to work.
You can use map
and zip
:
list(map(lambda x: sum(x)/len(x), zip(*data)))
[243.0, 240.5, 236.5]