How to select and store string values in pandas using regex

Question:

I want to select and store string values(regex) into variable. Here I want to extract USD, 1000.00, 21/01/2022 from Name and store into separate variables for amount = 1000.0, date = 21/01/2022 and currency = USD respectively.

import pandas as pd

data_dict = {
            'Name': ['I paid USD 1000.00 on 21/01/2022 for car service']
          }
df = pd.DataFrame(data_dict)
Asked By: AK007

||

Answers:

If the order is consistent and the currency always right before the amount, a simple regex with str.extract should work:

df['Name'].str.extract(r'(?P<currency>[A-Z]{3})s*(?P<amount>d+(?:.d+)).*(?P<date>dd/dd/d{4})')

output:

  currency   amount        date
0      USD  1000.00  21/01/2022
Notes
  • This won’t validate the date (99/00/9876 would match)
  • This assumes the currency is 3 uppercase letters (you can easily adapt)
  • The type of all columns is string, if you plan to use the amount or date for operations you must convert them

variant: numbers with decimal separator

out = df['Name'].str.extract(r'(?P<currency>[A-Z]{3})s*(?P<amount>d+(?:,d{3})*(?:.d+)).*(?P<date>dd/dd/d{4})')

example:

  currency      amount        date
0      USD     1000.00  21/01/2022
1      USD  100,000.00  21/01/2022
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.