How to calculate the sum of all columns of a 2D numpy array (efficiently)
Question:
Let’s say I have the following 2D numpy array consisting of four rows and three columns:
>>> a = numpy.arange(12).reshape(4,3)
>>> print(a)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
What would be an efficient way to generate a 1D array that contains the sum of all columns (like [18, 22, 26]
)? Can this be done without having the need to loop through all columns?
Answers:
Check out the documentation for numpy.sum
, paying particular attention to the axis
parameter. To sum over columns:
>>> import numpy as np
>>> a = np.arange(12).reshape(4,3)
>>> a.sum(axis=0)
array([18, 22, 26])
Or, to sum over rows:
>>> a.sum(axis=1)
array([ 3, 12, 21, 30])
Other aggregate functions, like numpy.mean
, numpy.cumsum
and numpy.std
, e.g., also take the axis
parameter.
From the Tentative Numpy Tutorial:
Many unary operations, such as computing the sum of all the elements
in the array, are implemented as methods of the ndarray
class. By
default, these operations apply to the array as though it were a list
of numbers, regardless of its shape. However, by specifying the axis
parameter you can apply an operation along the specified axis of an
array:
Use numpy.sum
. for your case, it is
sum = a.sum(axis=0)
Use the axis
argument:
>> numpy.sum(a, axis=0)
array([18, 22, 26])
Then NumPy sum
function takes an optional axis argument that specifies along which axis you would like the sum performed:
>>> a = numpy.arange(12).reshape(4,3)
>>> a.sum(0)
array([18, 22, 26])
Or, equivalently:
>>> numpy.sum(a, 0)
array([18, 22, 26])
Other alternatives for summing the columns are
numpy.einsum('ij->j', a)
and
numpy.dot(a.T, numpy.ones(a.shape[0]))
If the number of rows and columns is in the same order of magnitude, all of the possibilities are roughly equally fast:
If there are only a few columns, however, both the einsum
and the dot
solution significantly outperform numpy’s sum
(note the log-scale):
Code to reproduce the plots:
import numpy
import perfplot
def numpy_sum(a):
return numpy.sum(a, axis=1)
def einsum(a):
return numpy.einsum('ij->i', a)
def dot_ones(a):
return numpy.dot(a, numpy.ones(a.shape[1]))
perfplot.save(
"out1.png",
# setup=lambda n: numpy.random.rand(n, n),
setup=lambda n: numpy.random.rand(n, 3),
n_range=[2**k for k in range(15)],
kernels=[numpy_sum, einsum, dot_ones],
logx=True,
logy=True,
xlabel='len(a)',
)
a.sum(0)
should solve the problem. It is a 2d np.array
and you will get the sum of all column. axis=0
is the dimension that points downwards and axis=1
the one that points to the right.
Let’s say I have the following 2D numpy array consisting of four rows and three columns:
>>> a = numpy.arange(12).reshape(4,3)
>>> print(a)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
What would be an efficient way to generate a 1D array that contains the sum of all columns (like [18, 22, 26]
)? Can this be done without having the need to loop through all columns?
Check out the documentation for numpy.sum
, paying particular attention to the axis
parameter. To sum over columns:
>>> import numpy as np
>>> a = np.arange(12).reshape(4,3)
>>> a.sum(axis=0)
array([18, 22, 26])
Or, to sum over rows:
>>> a.sum(axis=1)
array([ 3, 12, 21, 30])
Other aggregate functions, like numpy.mean
, numpy.cumsum
and numpy.std
, e.g., also take the axis
parameter.
From the Tentative Numpy Tutorial:
Many unary operations, such as computing the sum of all the elements
in the array, are implemented as methods of thendarray
class. By
default, these operations apply to the array as though it were a list
of numbers, regardless of its shape. However, by specifying theaxis
parameter you can apply an operation along the specified axis of an
array:
Use numpy.sum
. for your case, it is
sum = a.sum(axis=0)
Use the axis
argument:
>> numpy.sum(a, axis=0)
array([18, 22, 26])
Then NumPy sum
function takes an optional axis argument that specifies along which axis you would like the sum performed:
>>> a = numpy.arange(12).reshape(4,3)
>>> a.sum(0)
array([18, 22, 26])
Or, equivalently:
>>> numpy.sum(a, 0)
array([18, 22, 26])
Other alternatives for summing the columns are
numpy.einsum('ij->j', a)
and
numpy.dot(a.T, numpy.ones(a.shape[0]))
If the number of rows and columns is in the same order of magnitude, all of the possibilities are roughly equally fast:
If there are only a few columns, however, both the einsum
and the dot
solution significantly outperform numpy’s sum
(note the log-scale):
Code to reproduce the plots:
import numpy
import perfplot
def numpy_sum(a):
return numpy.sum(a, axis=1)
def einsum(a):
return numpy.einsum('ij->i', a)
def dot_ones(a):
return numpy.dot(a, numpy.ones(a.shape[1]))
perfplot.save(
"out1.png",
# setup=lambda n: numpy.random.rand(n, n),
setup=lambda n: numpy.random.rand(n, 3),
n_range=[2**k for k in range(15)],
kernels=[numpy_sum, einsum, dot_ones],
logx=True,
logy=True,
xlabel='len(a)',
)
a.sum(0)
should solve the problem. It is a 2d np.array
and you will get the sum of all column. axis=0
is the dimension that points downwards and axis=1
the one that points to the right.