Unwrap numpy array stored in single cell in dataframe to rows

Question:

I have pandas dataframe where I have stored numpy 1D arrays in single cells, so the full array is only occupying one cell. There are also other columns with single values, although I don’t think that should matter.

My question is how I, somewhat efficiently, can unravel/unwrap the arrays and put them into rows? I have several columns that I would like to unwrap like this.

I can access the individual numbers by using i as index

df['column1'].iloc[0][i]

but there must be some smarter way than looping through it all and inserting the values individually to unwrap all the values.

The dataframe looks as follows. Some of the arrays are horizontal and some are vertical.

    column1            column2           column3
0   [0.012, 0.07, ...] [1.23, 1.92, ...] [132, 542, ...]

The desired output is

   column1 column2 column3
0  0.012   1.23    132
1  0.07    1.92    542
2  ...     ...     ...

Edit:
After using Ian’s solution I get the following output where all the numbers have been put into rows but the "formatting" from the numpy array remains. How can I avoid that?

   column1 column2 column3
0  [0.012]   [1.23]    [132]
1  [0.07]    [1.92]    [542]
2  ...     ...     ...

Edit2: This is how the data looks in the original df. The reason is that its a vertical numpy array. If I do a .shape on one of the columns it returns (1,).

   column1              column2              column3
0  [[0.012]n [0.07]]   [[1.23]n [1.92]]    [[132]n [542]]

Answers:

All Arrays have same len

# python 3.10.6
import numpy as np  # 1.23.4
import pandas as pd  # 1.5.1

# setup
df = pd.DataFrame.from_records(
    data=np.random.random((1, 3, 2)).round(3),
).add_prefix("column")

print(df)
          column0         column1         column2
0  [0.271, 0.544]  [0.579, 0.329]  [0.732, 0.305]
out = pd.concat([df[col].explode(ignore_index=True) for col in df],
                axis="columns")

print(out)
  column0 column1 column2
0   0.271   0.579   0.732
1   0.544   0.329   0.305

Some Arrays have different len

# setup
df = pd.concat([pd.DataFrame.from_records(np.random.random((1, 1, n)).round(3),
                                          columns=[f"column{c}"]) for c, n in enumerate(range(2, 5))],
               axis="columns")

print(df)
          column0                column1                       column2
0  [0.111, 0.691]  [0.215, 0.981, 0.605]  [0.696, 0.121, 0.531, 0.835]
out = pd.concat([df[col].explode(ignore_index=True) for col in df],
                axis="columns")

print(out)
  column0 column1 column2
0   0.111   0.215   0.696
1   0.691   0.981   0.121
2     NaN   0.605   0.531
3     NaN     NaN   0.835

References

Answered By: Ian Thompson
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.