NumPy ndarray of ndarray of float64 not flattening

Question:

I have a pandas dataframe with a column called ‘corr’. Each row contains an ndarray of float64. The following code is giving me issues:

import pandas as pd
experimentDataFrame = pd.DataFrame({'corr': [np.array([1.0,2.0]),np.array([3.0,4.0]),np.array([5.0,6.0])]})
corr = experimentDataFrame['corr'].to_numpy(copy=True)
print ([type(corr), corr.shape])
print ([type(corr[0]), corr[0].shape])
print ([type(corr[0][0]), corr[0][0].shape])
corr = corr.flatten()
print ([type(corr), corr.shape])
print ([type(corr[0]), corr[0].shape])
print ([type(corr[0][0]), corr[0][0].shape])

The output of which is

[<class 'numpy.ndarray'>, (3,)]
[<class 'numpy.ndarray'>, (2,)]
[<class 'numpy.float64'>, ()]
[<class 'numpy.ndarray'>, (3,)]
[<class 'numpy.ndarray'>, (2,)]
[<class 'numpy.float64'>, ()]

I’ve also tried corr.ravel() and corr.reshape(-1) instead of flatten with no difference. And I’ve tried corr.reshape(6) but I get, ValueError: cannot reshape array of size 35 into shape (6,).

What I’m expecting is that after flattening, corr[0] should be a float64 and not still an ndarray. My strong suspicion is that since corr is an ndarray of ndarrays of unknown length, flatten (and the rest) doesn’t work. Is there a function that will work without iterating manually?

Asked By: Eric Stimpson

||

Answers:

The problem is that experimentDataFrame['corr'].to_numpy(copy=True) is already flat, the shape is (35,). You have a dtype=object array.

You just want something like:

corr = np.concatenate([arr.ravel() for arr in experimentDataFrame['corr']])

Possibly, you can just do:

corr = np.concatenate(experimentDataFrame['corr'].tolist())

If all the inner arrays in your column are already flat. It isn’t clear that is the case from your question, but either of those should work.

EDIT:

And actually, you don’t need .tolist, just:

corr = np.concatenate(experimentDataFrame['corr']) 

works.

Answered By: juanpa.arrivillaga
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.