How to read NumPy 2D array from string?
Question:
How can I read a Numpy array from a string? Take a string like:
"[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
and convert it to an array:
a = from_string("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
where a
becomes the object: np.array([[0.5544, 0.4456], [0.8811, 0.1189]])
.
I’m looking for a very simple interface. A way to convert 2D arrays (of floats) to a string and then a way to read them back to reconstruct the array:
arr_to_string(array([[0.5544, 0.4456], [0.8811, 0.1189]]))
should return "[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
.
string_to_arr("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
should return the object array([[0.5544, 0.4456], [0.8811, 0.1189]])
.
Ideally arr_to_string
would have a precision parameter that controlled the precision of floating points converted to strings, so that you wouldn’t get entries like 0.4444444999999999999999999
.
There’s nothing I can find in the NumPy docs that does this both ways. np.save
lets you make a string but then there’s no way to load it back in (np.load
only works for files).
Answers:
I’m not sure there’s an easy way to do this if you don’t have commas between the numbers in your inner lists, but if you do, then you can use ast.literal_eval
:
import ast
import numpy as np
s = '[[ 0.5544, 0.4456], [ 0.8811, 0.1189]]'
np.array(ast.literal_eval(s))
array([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
EDIT: I haven’t tested it very much, but you could use re
to insert commas where you need them:
import re
s1 = '[[ 0.5544 0.4456], [ 0.8811 -0.1189]]'
# Replace spaces between numbers with commas:
s2 = re.sub('(d) +(-|d)', r'1,2', s1)
s2
'[[ 0.5544,0.4456], [ 0.8811,-0.1189]]'
and then hand on to ast.literal_eval
:
np.array(ast.literal_eval(s2))
array([[ 0.5544, 0.4456],
[ 0.8811, -0.1189]])
(you need to be careful to match spaces between digits but also spaces between a digit an a minus sign).
The challenge is to save not only the data buffer, but also the shape and dtype. np.fromstring
reads the data buffer, but as a 1d array; you have to get the dtype and shape from else where.
In [184]: a=np.arange(12).reshape(3,4)
In [185]: np.fromstring(a.tostring(),int)
Out[185]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [186]: np.fromstring(a.tostring(),a.dtype).reshape(a.shape)
Out[186]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
A time honored mechanism to save Python objects is pickle
, and numpy
is pickle compliant:
In [169]: import pickle
In [170]: a=np.arange(12).reshape(3,4)
In [171]: s=pickle.dumps(a*2)
In [172]: s
Out[172]: "cnumpy.core.multiarrayn_reconstructnp0n(cnumpynndarraynp1n(I0ntp2nS'b'np3ntp4nRp5n(I1n(I3nI4ntp6ncnumpyndtypenp7n(S'i4'np8nI0nI1ntp9nRp10n(I3nS'<'np11nNNNI-1nI-1nI0ntp12nbI00nS'\x00\x00\x00\x00\x02\x00\x00\x00\x04\x00\x00\x00\x06\x00\x00\x00\x08\x00\x00\x00\n\x00\x00\x00\x0c\x00\x00\x00\x0e\x00\x00\x00\x10\x00\x00\x00\x12\x00\x00\x00\x14\x00\x00\x00\x16\x00\x00\x00'np13ntp14nb."
In [173]: pickle.loads(s)
Out[173]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
There’s a numpy function that can read the pickle string:
In [181]: np.loads(s)
Out[181]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
You mentioned np.save
to a string, but that you can’t use np.load
. A way around that is to step further into the code, and use np.lib.npyio.format
.
In [174]: import StringIO
In [175]: S=StringIO.StringIO() # a file like string buffer
In [176]: np.lib.npyio.format.write_array(S,a*3.3)
In [177]: S.seek(0) # rewind the string
In [178]: np.lib.npyio.format.read_array(S)
Out[178]:
array([[ 0. , 3.3, 6.6, 9.9],
[ 13.2, 16.5, 19.8, 23.1],
[ 26.4, 29.7, 33. , 36.3]])
The save
string has a header with dtype
and shape
info:
In [179]: S.seek(0)
In [180]: S.readlines()
Out[180]:
["x93NUMPYx01x00Fx00{'descr': '<f8', 'fortran_order': False, 'shape': (3, 4), } n",
'x00x00x00x00x00x00x00x00ffffffn',
'@ffffffx1a@xccxccxccxccxccxcc#@ffffff*@x00x00x00x00x00x800@xccxccxccxccxccxcc3@x99x99x99x99x99x197@ffffff:@33333xb3=@x00x00x00x00x00x80@@fffff&B@']
If you want a human readable string, you might try json
.
In [196]: import json
In [197]: js=json.dumps(a.tolist())
In [198]: js
Out[198]: '[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]'
In [199]: np.array(json.loads(js))
Out[199]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Going to/from the list representation of the array is the most obvious use of json
. Someone may have written a more elaborate json
representation of arrays.
You could also go the csv
format route – there have been lots of questions about reading/writing csv arrays.
'[[ 0.5544 0.4456], [ 0.8811 0.1189]]'
is a poor string representation for this purpose. It does look a lot like the str()
of an array, but with ,
instead of n
. But there isn’t a clean way of parsing the nested []
, and the missing delimiter is a pain. If it consistently uses ,
then json
can convert it to list.
np.matrix
accepts a MATLAB like string:
In [207]: np.matrix(' 0.5544, 0.4456;0.8811, 0.1189')
Out[207]:
matrix([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
In [208]: str(np.matrix(' 0.5544, 0.4456;0.8811, 0.1189'))
Out[208]: '[[ 0.5544 0.4456]n [ 0.8811 0.1189]]'
Forward to string:
import numpy as np
def array2str(arr, precision=None):
s=np.array_str(arr, precision=precision)
return s.replace('n', ',')
Backward to array:
import re
import ast
import numpy as np
def str2array(s):
# Remove space after [
s=re.sub('[ +', '[', s.strip())
# Replace commas and spaces
s=re.sub('[,s]+', ', ', s)
return np.array(ast.literal_eval(s))
If you use repr()
to convert array to string, the conversion will be trivial.
numpy.fromstring() allows you to easily create 1D arrays from a string. Here’s a simple function to create a 2D numpy array from a string:
import numpy as np
def str2np(strArray):
lItems = []
width = None
for line in strArray.split("n"):
lParts = line.split()
n = len(lParts)
if n==0:
continue
if width is None:
width = n
else:
assert n == width, "invalid array spec"
lItems.append([float(str) for str in lParts])
return np.array(lItems)
Usage:
X = str2np("""
-2 2
-1 3
0 1
1 1
2 -1
""")
print(f"X = {X}")
Output:
X = [[-2. 2.]
[-1. 3.]
[ 0. 1.]
[ 1. 1.]
[ 2. -1.]]
In my case I found following command helpful for dumping:
string = str(array.tolist())
And for reloading:
array = np.array( eval(string) )
This should work for any dimensionality of numpy array.
How can I read a Numpy array from a string? Take a string like:
"[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
and convert it to an array:
a = from_string("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
where a
becomes the object: np.array([[0.5544, 0.4456], [0.8811, 0.1189]])
.
I’m looking for a very simple interface. A way to convert 2D arrays (of floats) to a string and then a way to read them back to reconstruct the array:
arr_to_string(array([[0.5544, 0.4456], [0.8811, 0.1189]]))
should return "[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
.
string_to_arr("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
should return the object array([[0.5544, 0.4456], [0.8811, 0.1189]])
.
Ideally arr_to_string
would have a precision parameter that controlled the precision of floating points converted to strings, so that you wouldn’t get entries like 0.4444444999999999999999999
.
There’s nothing I can find in the NumPy docs that does this both ways. np.save
lets you make a string but then there’s no way to load it back in (np.load
only works for files).
I’m not sure there’s an easy way to do this if you don’t have commas between the numbers in your inner lists, but if you do, then you can use ast.literal_eval
:
import ast
import numpy as np
s = '[[ 0.5544, 0.4456], [ 0.8811, 0.1189]]'
np.array(ast.literal_eval(s))
array([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
EDIT: I haven’t tested it very much, but you could use re
to insert commas where you need them:
import re
s1 = '[[ 0.5544 0.4456], [ 0.8811 -0.1189]]'
# Replace spaces between numbers with commas:
s2 = re.sub('(d) +(-|d)', r'1,2', s1)
s2
'[[ 0.5544,0.4456], [ 0.8811,-0.1189]]'
and then hand on to ast.literal_eval
:
np.array(ast.literal_eval(s2))
array([[ 0.5544, 0.4456],
[ 0.8811, -0.1189]])
(you need to be careful to match spaces between digits but also spaces between a digit an a minus sign).
The challenge is to save not only the data buffer, but also the shape and dtype. np.fromstring
reads the data buffer, but as a 1d array; you have to get the dtype and shape from else where.
In [184]: a=np.arange(12).reshape(3,4)
In [185]: np.fromstring(a.tostring(),int)
Out[185]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [186]: np.fromstring(a.tostring(),a.dtype).reshape(a.shape)
Out[186]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
A time honored mechanism to save Python objects is pickle
, and numpy
is pickle compliant:
In [169]: import pickle
In [170]: a=np.arange(12).reshape(3,4)
In [171]: s=pickle.dumps(a*2)
In [172]: s
Out[172]: "cnumpy.core.multiarrayn_reconstructnp0n(cnumpynndarraynp1n(I0ntp2nS'b'np3ntp4nRp5n(I1n(I3nI4ntp6ncnumpyndtypenp7n(S'i4'np8nI0nI1ntp9nRp10n(I3nS'<'np11nNNNI-1nI-1nI0ntp12nbI00nS'\x00\x00\x00\x00\x02\x00\x00\x00\x04\x00\x00\x00\x06\x00\x00\x00\x08\x00\x00\x00\n\x00\x00\x00\x0c\x00\x00\x00\x0e\x00\x00\x00\x10\x00\x00\x00\x12\x00\x00\x00\x14\x00\x00\x00\x16\x00\x00\x00'np13ntp14nb."
In [173]: pickle.loads(s)
Out[173]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
There’s a numpy function that can read the pickle string:
In [181]: np.loads(s)
Out[181]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
You mentioned np.save
to a string, but that you can’t use np.load
. A way around that is to step further into the code, and use np.lib.npyio.format
.
In [174]: import StringIO
In [175]: S=StringIO.StringIO() # a file like string buffer
In [176]: np.lib.npyio.format.write_array(S,a*3.3)
In [177]: S.seek(0) # rewind the string
In [178]: np.lib.npyio.format.read_array(S)
Out[178]:
array([[ 0. , 3.3, 6.6, 9.9],
[ 13.2, 16.5, 19.8, 23.1],
[ 26.4, 29.7, 33. , 36.3]])
The save
string has a header with dtype
and shape
info:
In [179]: S.seek(0)
In [180]: S.readlines()
Out[180]:
["x93NUMPYx01x00Fx00{'descr': '<f8', 'fortran_order': False, 'shape': (3, 4), } n",
'x00x00x00x00x00x00x00x00ffffffn',
'@ffffffx1a@xccxccxccxccxccxcc#@ffffff*@x00x00x00x00x00x800@xccxccxccxccxccxcc3@x99x99x99x99x99x197@ffffff:@33333xb3=@x00x00x00x00x00x80@@fffff&B@']
If you want a human readable string, you might try json
.
In [196]: import json
In [197]: js=json.dumps(a.tolist())
In [198]: js
Out[198]: '[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]'
In [199]: np.array(json.loads(js))
Out[199]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Going to/from the list representation of the array is the most obvious use of json
. Someone may have written a more elaborate json
representation of arrays.
You could also go the csv
format route – there have been lots of questions about reading/writing csv arrays.
'[[ 0.5544 0.4456], [ 0.8811 0.1189]]'
is a poor string representation for this purpose. It does look a lot like the str()
of an array, but with ,
instead of n
. But there isn’t a clean way of parsing the nested []
, and the missing delimiter is a pain. If it consistently uses ,
then json
can convert it to list.
np.matrix
accepts a MATLAB like string:
In [207]: np.matrix(' 0.5544, 0.4456;0.8811, 0.1189')
Out[207]:
matrix([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
In [208]: str(np.matrix(' 0.5544, 0.4456;0.8811, 0.1189'))
Out[208]: '[[ 0.5544 0.4456]n [ 0.8811 0.1189]]'
Forward to string:
import numpy as np
def array2str(arr, precision=None):
s=np.array_str(arr, precision=precision)
return s.replace('n', ',')
Backward to array:
import re
import ast
import numpy as np
def str2array(s):
# Remove space after [
s=re.sub('[ +', '[', s.strip())
# Replace commas and spaces
s=re.sub('[,s]+', ', ', s)
return np.array(ast.literal_eval(s))
If you use repr()
to convert array to string, the conversion will be trivial.
numpy.fromstring() allows you to easily create 1D arrays from a string. Here’s a simple function to create a 2D numpy array from a string:
import numpy as np
def str2np(strArray):
lItems = []
width = None
for line in strArray.split("n"):
lParts = line.split()
n = len(lParts)
if n==0:
continue
if width is None:
width = n
else:
assert n == width, "invalid array spec"
lItems.append([float(str) for str in lParts])
return np.array(lItems)
Usage:
X = str2np("""
-2 2
-1 3
0 1
1 1
2 -1
""")
print(f"X = {X}")
Output:
X = [[-2. 2.]
[-1. 3.]
[ 0. 1.]
[ 1. 1.]
[ 2. -1.]]
In my case I found following command helpful for dumping:
string = str(array.tolist())
And for reloading:
array = np.array( eval(string) )
This should work for any dimensionality of numpy array.