Pandas Dataframe: Extract info from specific series

Question:

I have this dataframe which need to extract package info (ML, KG, PZA, LT, UN, etc) from description column, and i’m pretty new at pandas.
This is the dataframe right now

SKU Description
1 TRIDENT 6S SANDIA 9GR
2 CANAST RABBIT F1 A 1UN
3 HAND SOAP VITAMIN E 442 ML.

I need to extract 9GR, 1UN, 442 ML, etc. and take it into another column. I need to extract what matches within a list of possible values that are going to be part of the accepted Package series so this a re the possible values

[GR, UN, LT, OZ]

Anything that matches in the description column this substrings i need to replace in the column Package and remove it from description column.

Asked By: bittrago

||

Answers:

You can use this regex:

pkg = ['ML', 'KG', 'PZA', 'LT', 'UN', 'GR']

df['package'] = df['Description'].str.extract(fr"b(d+s*(?:{'|'.join(pkg)}))b")
print(df)

# Output
   SKU                  Description package
0    1        TRIDENT 6S SANDIA 9GR     9GR
1    2       CANAST RABBIT F1 A 1UN     1UN
2    3  HAND SOAP VITAMIN E 442 ML.  442 ML
Answered By: Corralien