NumPy – What is the difference between frombuffer and fromstring?
Question:
They appear to give the same result to me:
In [32]: s
Out[32]: 'x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x15x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00'
In [27]: np.frombuffer(s, dtype="int8")
Out[27]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
In [28]: np.fromstring(s, dtype="int8")
Out[28]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
In [33]: b = buffer(s)
In [34]: b
Out[34]: <read-only buffer for 0x035F8020, size -1, offset 0 at 0x036F13A0>
In [35]: np.fromstring(b, dtype="int8")
Out[35]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
In [36]: np.frombuffer(b, dtype="int8")
Out[36]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
When should one be used vs. the other?
Answers:
From a practical standpoint, the difference is that:
x = np.fromstring(s, dtype='int8')
Will make a copy of the string in memory, while:
x = np.frombuffer(s, dtype='int8')
or
x = np.frombuffer(buffer(s), dtype='int8')
Will use the memory buffer of the string directly and won’t use any* additional memory. Using frombuffer
will also result in a read-only array if the input to buffer
is a string, as strings are immutable in python.
(*Neglecting a few bytes of memory used for an additional python ndarray
object — The underlying memory for the data will be shared.)
If you’re not familiar with buffer
objects (memoryview
in python3.x), they’re essentially a way for C-level libraries to expose a block of memory for use in python. It’s basically a python interface for managed access to raw memory.
If you were working with something that exposed the buffer interface, then you’d probably want to use frombuffer
. (Python 2.x strings and python 3.x bytes
expose the buffer interface, but you’ll get a read-only array, as python strings are immutable.)
Otherwise, use fromstring
to create a numpy array from a string. (Unless you know what you’re doing, and want to tightly control memory use, etc.)
They appear to give the same result to me:
In [32]: s
Out[32]: 'x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x15x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00'
In [27]: np.frombuffer(s, dtype="int8")
Out[27]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
In [28]: np.fromstring(s, dtype="int8")
Out[28]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
In [33]: b = buffer(s)
In [34]: b
Out[34]: <read-only buffer for 0x035F8020, size -1, offset 0 at 0x036F13A0>
In [35]: np.fromstring(b, dtype="int8")
Out[35]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
In [36]: np.frombuffer(b, dtype="int8")
Out[36]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
When should one be used vs. the other?
From a practical standpoint, the difference is that:
x = np.fromstring(s, dtype='int8')
Will make a copy of the string in memory, while:
x = np.frombuffer(s, dtype='int8')
or
x = np.frombuffer(buffer(s), dtype='int8')
Will use the memory buffer of the string directly and won’t use any* additional memory. Using frombuffer
will also result in a read-only array if the input to buffer
is a string, as strings are immutable in python.
(*Neglecting a few bytes of memory used for an additional python ndarray
object — The underlying memory for the data will be shared.)
If you’re not familiar with buffer
objects (memoryview
in python3.x), they’re essentially a way for C-level libraries to expose a block of memory for use in python. It’s basically a python interface for managed access to raw memory.
If you were working with something that exposed the buffer interface, then you’d probably want to use frombuffer
. (Python 2.x strings and python 3.x bytes
expose the buffer interface, but you’ll get a read-only array, as python strings are immutable.)
Otherwise, use fromstring
to create a numpy array from a string. (Unless you know what you’re doing, and want to tightly control memory use, etc.)