Create NumPy array from list with operations

Question:

I have data in a Python list from an SQLite database in the following format:

# This is an example
data = [(1, '12345', 1, 0, None), (1, '34567', 1, 1, None)]

From this list of tuples I want to create a 2D NumPy array, converting each tuple to an array. I also want the values at index 1 in the tuples converted from string to numbers, and the values at last index converted to 0 if None, 1 otherwise.

What it should look like afterwards:

transformed_data = np.asarray([[1, 12345, 1, 0, 0], [1, 34567, 1, 1, 0]])

I am able to with simple for loops, however I’d like a more Pythonic solution with native NumPy methods or otherwise. I am working with a very large database, so complexity matters.

Asked By: Guillaume

||

Answers:

is quite good at this:

import pandas as pd
                      # set up DataFrame
transformed_data = (pd.DataFrame(data)
                      # convert to numeric
                      .apply(pd.to_numeric, errors='coerce')
                      # replace null with 0
                      # trying to cast as integer if possible
                      .fillna(0, downcast='infer')
                      # convert to numpy array
                      .to_numpy()
                   )

output:

array([[    1, 12345,     1,     0,     0],
       [    1, 34567,     1,     1,     0]])
Answered By: mozway

If your tuple is small and of a fixed size then you can use a list comprehension:

result = [(a, int(b), c, d, 0 if e is None else e) for a, b, c, d, e in data]

Or a little shorter:

result = [(d[0], int(d[1]), *d[2:4], d[4] if d[4] else 0) for d in data]
Answered By: Prins
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.