Converting int arrays to string arrays in numpy without truncation
Question:
Trying to convert int arrays to string arrays in numpy
In [66]: a=array([0,33,4444522])
In [67]: a.astype(str)
Out[67]:
array(['0', '3', '4'],
dtype='|S1')
Not what I intended
In [68]: a.astype('S10')
Out[68]:
array(['0', '33', '4444522'],
dtype='|S10')
This works but I had to know 10 was big enough to hold my longest string. Is there a way of doing this easily without knowing ahead of time what size string you need? It seems a little dangerous that it just quietly truncates your string without throwing an error.
Answers:
Again, this can be solved in pure Python:
>>> map(str, [0,33,4444522])
['0', '33', '4444522']
Or if you need to convert back and forth:
>>> a = np.array([0,33,4444522])
>>> np.array(map(str, a))
array(['0', '33', '4444522'],
dtype='|S7')
You can find the smallest sufficient width like so:
In [3]: max(len(str(x)) for x in [0,33,4444522])
Out[3]: 7
Alternatively, just construct the ndarray
from a list of strings:
In [7]: np.array([str(x) for x in [0,33,4444522]])
Out[7]:
array(['0', '33', '4444522'],
dtype='|S7')
or, using map()
:
In [8]: np.array(map(str, [0,33,4444522]))
Out[8]:
array(['0', '33', '4444522'],
dtype='|S7')
You can stay in numpy, doing
np.char.mod('%d', a)
This is twice faster than map
or list comprehensions for 10 elements, four times faster for 100. This and other string operations are documented here.
np.apply_along_axis(lambda y: [str(i) for i in y], 0, x)
Example
>>> import numpy as np
>>> x = np.array([-1]*10+[0]*10+[1]*10)
array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> np.apply_along_axis(lambda y: [str(i) for i in y], 0, x).tolist()
['-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1',
'1', '1', '1', '1']
Use arr.astype(str)
, as int
to str
conversion is now supported by numpy
with the desired outcome:
import numpy as np
a = np.array([0,33,4444522])
res = a.astype(str)
print(res)
array(['0', '33', '4444522'],
dtype='<U11')
For those working with Python 3.9, the command should be:
list(map(str, [1,2,3]))
Trying to convert int arrays to string arrays in numpy
In [66]: a=array([0,33,4444522])
In [67]: a.astype(str)
Out[67]:
array(['0', '3', '4'],
dtype='|S1')
Not what I intended
In [68]: a.astype('S10')
Out[68]:
array(['0', '33', '4444522'],
dtype='|S10')
This works but I had to know 10 was big enough to hold my longest string. Is there a way of doing this easily without knowing ahead of time what size string you need? It seems a little dangerous that it just quietly truncates your string without throwing an error.
Again, this can be solved in pure Python:
>>> map(str, [0,33,4444522])
['0', '33', '4444522']
Or if you need to convert back and forth:
>>> a = np.array([0,33,4444522])
>>> np.array(map(str, a))
array(['0', '33', '4444522'],
dtype='|S7')
You can find the smallest sufficient width like so:
In [3]: max(len(str(x)) for x in [0,33,4444522])
Out[3]: 7
Alternatively, just construct the ndarray
from a list of strings:
In [7]: np.array([str(x) for x in [0,33,4444522]])
Out[7]:
array(['0', '33', '4444522'],
dtype='|S7')
or, using map()
:
In [8]: np.array(map(str, [0,33,4444522]))
Out[8]:
array(['0', '33', '4444522'],
dtype='|S7')
You can stay in numpy, doing
np.char.mod('%d', a)
This is twice faster than map
or list comprehensions for 10 elements, four times faster for 100. This and other string operations are documented here.
np.apply_along_axis(lambda y: [str(i) for i in y], 0, x)
Example
>>> import numpy as np
>>> x = np.array([-1]*10+[0]*10+[1]*10)
array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> np.apply_along_axis(lambda y: [str(i) for i in y], 0, x).tolist()
['-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1',
'1', '1', '1', '1']
Use arr.astype(str)
, as int
to str
conversion is now supported by numpy
with the desired outcome:
import numpy as np
a = np.array([0,33,4444522])
res = a.astype(str)
print(res)
array(['0', '33', '4444522'],
dtype='<U11')
For those working with Python 3.9, the command should be:
list(map(str, [1,2,3]))