Pandas regex : Split String column into multiple integer columns

Question:

I’m completely new to regex and i’m facing this challenge that is taking me hours to solve.
I have the following dataframe with a string column "Dimensions":

df
                 Dimensions
0                           "Width:2 cm"
1      "Diameter: 1.2 cm, Height: 10 cm"
2        "Diameter: 3.4cm, Volume: 10cm"

I’m looking for a way to "split" the dimensions column into multiple columns :

desired_df

             Width      Diameter      Height    Volume
0              2          null         null       null
1             null         1.2          10        null
2             null         3.4         null        10

Note that the columns order don’t matter.
If you have any hints that could help me I would appreciate it, thanks !

Asked By: chessPL

||

Answers:

Try:

x = df["Dimensions"].str.extractall(r'([^s"]+)s*:s*(d+.?d*)').droplevel(1)
x = x.pivot(columns=0, values=1)
x.columns.name = None
print(x)

Prints:

  Diameter Height Volume Width
0      NaN    NaN    NaN     2
1      1.2     10    NaN   NaN
2      3.4    NaN     10   NaN
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.