Python pandas regex extract to 4 new columns

Question:

import pandas as pd
df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)']})
print(df)

Current DataFrame:

             data
0  2 (B) - 15 (K)

What am looking to do is to extract 2, B, 15 and K into 4 new columns within the same dataframe.

is that possible using pandas.regex directly?

Answers:

With the same pattern spaces-parenthesis-dash and empty strings, this way works

df = pd.DataFrame(data={'data': ['2 (B) - 15 (K)', '']})
print(df['data'].str.extract('(d*).((.)).-.(d*).((.))'))
#      0    1    2    3
# 0    2    B   15    K
# 1  NaN  NaN  NaN  NaN
Answered By: Ben.T

You can extract all characters that are numeric or alphabetical using str.extractall, then unstack the result:

>>> df.data.str.extractall("([A-Za-z1-9]+)").unstack()

       0
match  0  1   2  3
0      2  B  15  K

To re-assign the extracted values to the original dataframe, you can use:

df[["col1", "col2", "col3", "col4"]] = df.data.str.extractall("([A-Za-z1-9]+)").unstack()
Answered By: sacuL
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.