Evaluating a string as a numpy array

Question:

My team is migrating from Clickhouse to Azure Data Explorer (ADX). We are currently experiencing difficulties to query our data from ADX: the queried values are correct, but the data are read as a string rather than as an array of floats.

Here is an example string:

mydummystring='[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'

In order to convert this string to a numpy array, I found this workaround based on list comprehension (inspired by this SO post):

import numpy as np
mynumpyarray = np.array([np.array(x) for x in eval('['+mydummystring.replace('][', '],[')+']')])

Is there a better (safer?) way to achieve this conversion? I know that it would be better to read the data correctly in the first place, but for now I am looking for a robust way to convert the output string to actual numbers.

Asked By: Sheldon

||

Answers:

You can convert '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]' to '[[1.0,2.0,3.0],[4.0,5.0,6.0],[6.0,7.0,8.0]]' with str.replace the use ast.literal_eval.

import ast
mydummystring = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
mydummystring = '[' + mydummystring.replace('][', '],[') + ']'
mydummystring = ast.literal_eval(mydummystring)
arr = np.array(mydummystring)
print(arr)

Or use json.loads:

import json
mydummystring = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
mydummystring = '[' + mydummystring.replace('][', '],[') + ']'
mydummystring = json.loads(mydummystring)
arr = np.array(mydummystring)
print(arr)

array([[1., 2., 3.],
       [4., 5., 6.],
       [6., 7., 8.]])
Answered By: I'mahdi

You can use ast.literal_eval, which only parses Python literal structures and does not run arbitrary code.

from ast import literal_eval
s = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
np_arr = np.array([np.array(x) for x in literal_eval('['+s.replace('][', '],[')+']')])

Note that a list comprehension is not necessary to create the NumPy array.

np.array(literal_eval('['+s.replace('][', '],[')+']'))
Answered By: Unmitigated

Without any extra library but with string prepeocessing.
np.fromstring returns a 1 dimensional array, so find the shape, format it and then reshape.

s = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
shape = s.count('['), s.count('][')+1
# flat array format
s = s.strip('][').replace('][', ',')
a = np.fromstring(s, sep=',', dtype=float).reshape(shape)
Answered By: cards
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.