Pandas Multi-Index DataFrame to Numpy Ndarray

Question:

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:

               s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):

[[[ 0.0  0.0  0.8  0.2 ]
  [ 0.1  0.0  0.9  0.0 ]]

 [[ 0.0  0.0  0.9  0.1 ]
  [ 0.0  0.0  1.0  0.0]]]

I have tried df.as_matrix() but this returns:

 [[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]
  [ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]

How do I return a list of lists for the first level with each list representing an Action records.

Asked By: sccrthlt

||

Answers:

One way

In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2], 
        [0.1, 0.0, 0.9, 0.0]],
       [[0.0, 0.0, 0.9, 0.1],
        [0.0, 0.0, 1.0, 0.0]]], dtype=object)
Answered By: Zero

You could use the following:

dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

The first line just finds the number of groups that you want to groupby.

Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.

Answered By: Brad Solomon

Using Divakar’s suggestion, np.reshape() worked:

>>> print(P)

              s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

>>> np.reshape(P,(2,2,-1))

[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

>>> np.shape(P)

(2, 2, 4)
Answered By: sccrthlt

Elaborating on Brad Solomon’s answer, to get a sligthly more generic solution – indexes of different sizes and an unfixed number of indexes – one could do something like this:

def df_to_numpy(df):
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):

def enforce_df_shape(df):
    try:
        ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
    except AttributeError:
        return df
    fulldf = pd.DataFrame(-1, columns=df.columns, index=ind)  # remove -1 to fill fulldf with nan
    fulldf.update(df)
    return fulldf
Answered By: Vadb

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:

x = df.s1.to_numpy().reshape(df.index.levshape)

This will give you a (2,2) containing the value of s1.

Answered By: Tom Johnson
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.