NumPy – What is the difference between frombuffer and fromstring?

Question:

They appear to give the same result to me:

In [32]: s
Out[32]: 'x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x15x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00'

In [27]: np.frombuffer(s, dtype="int8")
Out[27]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

In [28]: np.fromstring(s, dtype="int8")
Out[28]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

In [33]: b = buffer(s)

In [34]: b
Out[34]: <read-only buffer for 0x035F8020, size -1, offset 0 at 0x036F13A0>

In [35]: np.fromstring(b, dtype="int8")
Out[35]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

In [36]: np.frombuffer(b, dtype="int8")
Out[36]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

When should one be used vs. the other?

Asked By: user202987

||

Answers:

From a practical standpoint, the difference is that:

x = np.fromstring(s, dtype='int8')

Will make a copy of the string in memory, while:

x = np.frombuffer(s, dtype='int8')

or

x = np.frombuffer(buffer(s), dtype='int8')

Will use the memory buffer of the string directly and won’t use any* additional memory. Using frombuffer will also result in a read-only array if the input to buffer is a string, as strings are immutable in python.

(*Neglecting a few bytes of memory used for an additional python ndarray object — The underlying memory for the data will be shared.)


If you’re not familiar with buffer objects (memoryview in python3.x), they’re essentially a way for C-level libraries to expose a block of memory for use in python. It’s basically a python interface for managed access to raw memory.

If you were working with something that exposed the buffer interface, then you’d probably want to use frombuffer. (Python 2.x strings and python 3.x bytes expose the buffer interface, but you’ll get a read-only array, as python strings are immutable.)

Otherwise, use fromstring to create a numpy array from a string. (Unless you know what you’re doing, and want to tightly control memory use, etc.)

Answered By: Joe Kington
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.