# Pandas Multi-Index DataFrame to Numpy Ndarray

## Question:

I am trying to convert a multi-index pandas `DataFrame`

into a `numpy.ndarray`

. The DataFrame is below:

```
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
```

I would like the resulting `numpy.ndarray`

to be the following with `np.shape() = (2,2,4)`

:

```
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
```

I have tried `df.as_matrix()`

but this returns:

```
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
```

How do I return a list of lists for the first level with each list representing an `Action`

records.

## Answers:

One way

```
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)
```

You could use the following:

```
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
```

The first line just finds the number of groups that you want to groupby.

Why this (or groupby) is needed: as soon as you use `.values`

, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.

Using Divakar’s suggestion, `np.reshape()`

worked:

```
>>> print(P)
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
>>> np.reshape(P,(2,2,-1))
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
>>> np.shape(P)
(2, 2, 4)
```

Elaborating on Brad Solomon’s answer, to get a sligthly more generic solution – indexes of different sizes and an unfixed number of indexes – one could do something like this:

```
def df_to_numpy(df):
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
```

If `df`

has missing sub-indexes `reshape`

will not work. One way to add them would be (maybe there are better solutions):

```
def enforce_df_shape(df):
try:
ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
except AttributeError:
return df
fulldf = pd.DataFrame(-1, columns=df.columns, index=ind) # remove -1 to fill fulldf with nan
fulldf.update(df)
return fulldf
```

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the `.index.levshape`

like this:

```
x = df.s1.to_numpy().reshape(df.index.levshape)
```

This will give you a (2,2) containing the value of s1.