Pandas dataframe columns of lists to numpy arrays for each column

Question:

I’ve currently got a pandas dataframe from reading in CSV’s where each column looks like the following column.

>>> train["question1"]
209174            [198, 87, 42, 1568, 193, 7461, 3143, 189]
166856       [198, 110, 1146, 87, 82, 1466, 7, 8, 123, 189]
335224    [198, 89, 42, 3393, 5, 193, 1109, 13, 42, 304,...
244308                      [15, 71360, 1439, 7, 8012, 189]
234779    [39, 15, 8, 440, 2227, 2, 179904, 29563, 47, 9...
213555                       [103, 33, 393, 2707, 291, 189]
288254       [198, 87, 42, 2369, 8, 1033, 26, 8, 1410, 189]
172107    [103, 15, 1, 2334, 119, 8, 201535, 6, 8, 46012...
259159    [198, 110, 70, 4162, 1, 14109, 65, 1, 180, 6, ...
376926    [103, 33, 1, 5395, 7646, 7, 1080, 4, 665, 4078...
376802                      [103, 33, 393, 2707, 1146, 189]
274396      [103, 15, 1, 255, 10820, 125, 83279, 4624, 189]
137372    [198, 87, 42, 311, 8, 127172, 232, 1531, 1293,...
377806    [103, 33, 78, 1421, 5, 1009, 8, 2373, 224, 6, ...
293271    [309002, 46, 198, 89, 82, 659, 8, 996, 14, 309...
102517    [103, 33, 78, 4104, 4, 1122, 6609, 112, 2155, ...
123516       [103, 15, 1, 2801, 4, 8, 1122, 1792, 717, 189]
337879                     [103, 1229, 15, 22208, 188, 189]
112974            [198, 87, 42, 15775, 8, 13837, 2712, 189]
159254    [15, 64, 30, 14673, 11, 17679, 13, 887, 10, 82...
366796    [33, 10058, 12715, 6, 10058, 5599, 1, 216, 874...
395723        [739, 261, 43580, 489, 37, 501, 131, 57, 189]
237095            [198, 6737, 15, 1, 642, 6805, 48605, 189]
337426      [103, 15, 1, 255, 242, 7, 526, 11, 103466, 189]
233527    [103, 120, 1927, 1053, 1703, 62, 19, 17, 29, 1...
155205    [198, 89, 42, 3134, 6385, 6, 4670, 729, 14, 8,...
289580    [190, 1, 298, 79, 496, 30, 240, 7265, 5, 45, 7...
222376    [198, 110, 544, 3483, 500, 7, 1, 96, 237, 63, ...
236585        [103, 1183, 36, 181, 5, 14944, 1, 14490, 189]
234172    [198, 120, 1, 29, 98, 3279, 98, 3279, 98, 1223...

If I go ahead and get the values of it and then gets it shape, its in the form of

>>> train["question1"].values.shape
(283001,)

What I would like to have is to decompose each column into an ndarray such that it would actually have a shape of [283001, 144]

Asked By: TheM00s3

||

Answers:

If you lists are all the same length

np.array(train["question1"].values.tolist())

If they are not, use pd.DataFrame to adjust for you

pd.DataFrame(train["question1"].values.tolist()).values
Answered By: piRSquared

First convert each row from type object to numpy array

train["question1"] = train["question1"].apply(lambda x: np.array(x, dtype=np.float32))

Then convert pandas column to numpy array:

train_array = train["question1"].to_numpy()

Then convert array of arrays to single array

train_array = np.stack(train_array)

Answered By: YScharf
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.