numpy array row major and column major

Question:

I’m having trouble understanding how numpy stores its data. Consider the following:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in xrange(6): a.itemset(i, i+1)
... 
>>> a
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])
>>> a.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This says that a is column major (F_CONTIGUOUS) thus, internally, a should look like the following:

[1, 4, 2, 5, 3, 6]

This is just what it is stated in in this glossary. What is confusing me is that if I try to to access the data of a in a linear fashion instead I get:

>>> for i in xrange(6): print a.item(i)
... 
1.0
2.0
3.0
4.0
5.0
6.0

At this point I’m not sure what the F_CONTIGUOUS flag tells us since it does not honor the ordering. Apparently everything in python is row major and when we want to iterate in a linear fashion we can use the iterator flat.

The question is the following: given that we have a list of numbers, say: 1, 2, 3, 4, 5, 6, how can we create a numpy array of shape (2, 3) in column major order? That is how can I get a matrix that looks like this

array([[ 1.,  3.,  5.],
       [ 2.,  4.,  6.]])

I would really like to be able to iterate linearly over the list and place them into the newly created ndarray. The reason for this is because I will be reading files of multidimensional arrays set in column major order.

Asked By: jmlopez

||

Answers:

The numpy stores data in row major order.

>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a.shape
(2, 4)
>>> a.shape = 4,2
>>> a
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

If you change the shape, the order of data do not change.

If you add a ‘F’, you can get what you want.

>>> b
array([1, 2, 3, 4, 5, 6])
>>> c = b.reshape(2,3,order='F')
>>> c
array([[1, 3, 5],
       [2, 4, 6]])
Answered By: Kill Console

In general, numpy uses order to describe the memory layout, but the python behavior of the arrays should be consistent regardless of the memory layout. I think you can get the behavior you want using views. A view is an array that shares memory with another array. For example:

import numpy as np

a = np.arange(1, 6 + 1)
b = a.reshape(3, 2).T

a[1] = 99
print b
# [[ 1  3  5]
#  [99  4  6]]

Hope that helps.

Answered By: Bi Rico

Your question has been answered, but I thought I would add this to explain your observations regarding, “At this point I’m not sure what the F_CONTIGUOUS flag tells us since it does not honor the ordering.”


The item method doesn’t directly access the data like you think it does. To do this, you should access the data attribute, which gives you the byte string.

An example:

c = np.array([[1,2,3],
              [4,6,7]], order='C')

f = np.array([[1,2,3],
              [4,6,7]], order='F')

Observe

print c.flags.c_contiguous, f.flags.f_contiguous
# True, True

and

print c.nbytes == len(c.data)
# True

Now let’s print the contiguous data for both:

nelements = np.prod(c.shape)
bsize = c.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = c.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=c.dtype)[0], # Convert to number.

This prints:

1 2 3 4 6 7

which is what we expect since c is order 'C', i.e., its data is stored row-major contiguous.

On the other hand,

nelements = np.prod(f.shape)
bsize = f.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = f.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=f.dtype)[0], # Convert to number.

prints

1 4 2 6 3 7

which, again, is what we expect to see since f‘s data is stored column-major contiguous.

Answered By: Matt Hancock

Here is a simple way to print the data in memory order, by using the ravel() function:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in range(6): a.itemset(i, i+1)

>>> print(a.ravel(order='K'))
[ 1.  4.  2.  5.  3.  6.]

This confirms that the array is stored in Fortran order.

Answered By: cfh

Wanted to add this in the comments but my rep is too low:

While Kill Console’s answer gave the OP’s required solution, I think it’s important to note that as stated in the numpy.reshape() documentation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html):

Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

so even if the view is column-wise, the data itself may not be, which may lead to inefficiencies in calculations which benefit from the data being stored column-wise in memory. Perhaps:

a = np.array(np.array([1, 2, 3, 4, 5, 6]).reshape(2,3,order='F'), order='F')

provides more of a guarantee that the data is stored column-wise (see order argument description at https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html).

Answered By: KamKam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.