What is the difference between the types <type 'numpy.string_'> and <type 'str'>?

Question:

Is there a difference between the types <type 'numpy.string_'> and <type 'str'>?

Asked By: Blunt

||

Answers:

numpy.string_ is the NumPy datatype used for arrays containing fixed-width byte strings. On the other hand, str is a native Python type and can not be used as a datatype for NumPy arrays*.

If you create a NumPy array containing strings, the array will use the numpy.string_ type (or the numpy.unicode_ type in Python 3). More precisely, the array will use a sub-datatype of np.string_:

>>> a = np.array(['abc', 'xy'])
>>> a
array(['abc', 'xy'], dtype='<S3')
>>> np.issubdtype('<S3', np.string_)
True

In this case the datatype is '<S3': the < denotes the byte-order (little-endian), S denotes the string type and 3 indicates that each value in the array holds up to three characters (or bytes).

One property that np.string_ and str share is immutability. Trying to increase the length of a Python str object will create a new object in memory. Similarly, if you want fixed-width NumPy array to hold more characters, a new larger array will have to be created in memory.


* Note that it is possible to create a NumPy object array which contains references to Python str objects, but such arrays behave quite differently to normal arrays.

Answered By: Alex Riley
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.