Create NumPy array from list with operations
Question:
I have data in a Python list from an SQLite database in the following format:
# This is an example
data = [(1, '12345', 1, 0, None), (1, '34567', 1, 1, None)]
From this list of tuples I want to create a 2D NumPy array, converting each tuple to an array. I also want the values at index 1 in the tuples converted from string to numbers, and the values at last index converted to 0 if None, 1 otherwise.
What it should look like afterwards:
transformed_data = np.asarray([[1, 12345, 1, 0, 0], [1, 34567, 1, 1, 0]])
I am able to with simple for loops, however I’d like a more Pythonic solution with native NumPy methods or otherwise. I am working with a very large database, so complexity matters.
Answers:
pandas is quite good at this:
import pandas as pd
# set up DataFrame
transformed_data = (pd.DataFrame(data)
# convert to numeric
.apply(pd.to_numeric, errors='coerce')
# replace null with 0
# trying to cast as integer if possible
.fillna(0, downcast='infer')
# convert to numpy array
.to_numpy()
)
output:
array([[ 1, 12345, 1, 0, 0],
[ 1, 34567, 1, 1, 0]])
If your tuple is small and of a fixed size then you can use a list comprehension:
result = [(a, int(b), c, d, 0 if e is None else e) for a, b, c, d, e in data]
Or a little shorter:
result = [(d[0], int(d[1]), *d[2:4], d[4] if d[4] else 0) for d in data]
I have data in a Python list from an SQLite database in the following format:
# This is an example
data = [(1, '12345', 1, 0, None), (1, '34567', 1, 1, None)]
From this list of tuples I want to create a 2D NumPy array, converting each tuple to an array. I also want the values at index 1 in the tuples converted from string to numbers, and the values at last index converted to 0 if None, 1 otherwise.
What it should look like afterwards:
transformed_data = np.asarray([[1, 12345, 1, 0, 0], [1, 34567, 1, 1, 0]])
I am able to with simple for loops, however I’d like a more Pythonic solution with native NumPy methods or otherwise. I am working with a very large database, so complexity matters.
pandas is quite good at this:
import pandas as pd
# set up DataFrame
transformed_data = (pd.DataFrame(data)
# convert to numeric
.apply(pd.to_numeric, errors='coerce')
# replace null with 0
# trying to cast as integer if possible
.fillna(0, downcast='infer')
# convert to numpy array
.to_numpy()
)
output:
array([[ 1, 12345, 1, 0, 0],
[ 1, 34567, 1, 1, 0]])
If your tuple is small and of a fixed size then you can use a list comprehension:
result = [(a, int(b), c, d, 0 if e is None else e) for a, b, c, d, e in data]
Or a little shorter:
result = [(d[0], int(d[1]), *d[2:4], d[4] if d[4] else 0) for d in data]