Unwrap numpy array stored in single cell in dataframe to rows
Question:
I have pandas dataframe where I have stored numpy 1D arrays in single cells, so the full array is only occupying one cell. There are also other columns with single values, although I don’t think that should matter.
My question is how I, somewhat efficiently, can unravel/unwrap the arrays and put them into rows? I have several columns that I would like to unwrap like this.
I can access the individual numbers by using i as index
df['column1'].iloc[0][i]
but there must be some smarter way than looping through it all and inserting the values individually to unwrap all the values.
The dataframe looks as follows. Some of the arrays are horizontal and some are vertical.
column1 column2 column3
0 [0.012, 0.07, ...] [1.23, 1.92, ...] [132, 542, ...]
The desired output is
column1 column2 column3
0 0.012 1.23 132
1 0.07 1.92 542
2 ... ... ...
Edit:
After using Ian’s solution I get the following output where all the numbers have been put into rows but the "formatting" from the numpy array remains. How can I avoid that?
column1 column2 column3
0 [0.012] [1.23] [132]
1 [0.07] [1.92] [542]
2 ... ... ...
Edit2: This is how the data looks in the original df. The reason is that its a vertical numpy array. If I do a .shape
on one of the columns it returns (1,)
.
column1 column2 column3
0 [[0.012]n [0.07]] [[1.23]n [1.92]] [[132]n [542]]
Answers:
All Arrays have same len
# python 3.10.6
import numpy as np # 1.23.4
import pandas as pd # 1.5.1
# setup
df = pd.DataFrame.from_records(
data=np.random.random((1, 3, 2)).round(3),
).add_prefix("column")
print(df)
column0 column1 column2
0 [0.271, 0.544] [0.579, 0.329] [0.732, 0.305]
out = pd.concat([df[col].explode(ignore_index=True) for col in df],
axis="columns")
print(out)
column0 column1 column2
0 0.271 0.579 0.732
1 0.544 0.329 0.305
Some Arrays have different len
# setup
df = pd.concat([pd.DataFrame.from_records(np.random.random((1, 1, n)).round(3),
columns=[f"column{c}"]) for c, n in enumerate(range(2, 5))],
axis="columns")
print(df)
column0 column1 column2
0 [0.111, 0.691] [0.215, 0.981, 0.605] [0.696, 0.121, 0.531, 0.835]
out = pd.concat([df[col].explode(ignore_index=True) for col in df],
axis="columns")
print(out)
column0 column1 column2
0 0.111 0.215 0.696
1 0.691 0.981 0.121
2 NaN 0.605 0.531
3 NaN NaN 0.835
References
I have pandas dataframe where I have stored numpy 1D arrays in single cells, so the full array is only occupying one cell. There are also other columns with single values, although I don’t think that should matter.
My question is how I, somewhat efficiently, can unravel/unwrap the arrays and put them into rows? I have several columns that I would like to unwrap like this.
I can access the individual numbers by using i as index
df['column1'].iloc[0][i]
but there must be some smarter way than looping through it all and inserting the values individually to unwrap all the values.
The dataframe looks as follows. Some of the arrays are horizontal and some are vertical.
column1 column2 column3
0 [0.012, 0.07, ...] [1.23, 1.92, ...] [132, 542, ...]
The desired output is
column1 column2 column3
0 0.012 1.23 132
1 0.07 1.92 542
2 ... ... ...
Edit:
After using Ian’s solution I get the following output where all the numbers have been put into rows but the "formatting" from the numpy array remains. How can I avoid that?
column1 column2 column3
0 [0.012] [1.23] [132]
1 [0.07] [1.92] [542]
2 ... ... ...
Edit2: This is how the data looks in the original df. The reason is that its a vertical numpy array. If I do a .shape
on one of the columns it returns (1,)
.
column1 column2 column3
0 [[0.012]n [0.07]] [[1.23]n [1.92]] [[132]n [542]]
All Arrays have same len
# python 3.10.6
import numpy as np # 1.23.4
import pandas as pd # 1.5.1
# setup
df = pd.DataFrame.from_records(
data=np.random.random((1, 3, 2)).round(3),
).add_prefix("column")
print(df)
column0 column1 column2
0 [0.271, 0.544] [0.579, 0.329] [0.732, 0.305]
out = pd.concat([df[col].explode(ignore_index=True) for col in df],
axis="columns")
print(out)
column0 column1 column2
0 0.271 0.579 0.732
1 0.544 0.329 0.305
Some Arrays have different len
# setup
df = pd.concat([pd.DataFrame.from_records(np.random.random((1, 1, n)).round(3),
columns=[f"column{c}"]) for c, n in enumerate(range(2, 5))],
axis="columns")
print(df)
column0 column1 column2
0 [0.111, 0.691] [0.215, 0.981, 0.605] [0.696, 0.121, 0.531, 0.835]
out = pd.concat([df[col].explode(ignore_index=True) for col in df],
axis="columns")
print(out)
column0 column1 column2
0 0.111 0.215 0.696
1 0.691 0.981 0.121
2 NaN 0.605 0.531
3 NaN NaN 0.835