Count number of characters between two specific characters in dataframe

Question:

I’m trying to extract the date into a new column from a column containing a string in a dataframe using Python.

The field always starts with [SOME_TEXT]_DD/MM/YYYY

SOME_TEXT is of variable length

The characters [ and ] may or may not appear again later in the string

I am able to extract the date where [SOME_TEXT]_ is 18 characters long like this:-

df[‘date’] = MID(df[‘Column_with_date’],18,10)

So I figure if I know the length of [SOME_TEXT] I can adapt the above line.

How can I calculate the length of [SOME_TEXT]?

Asked By: Ellen Coggin

||

Answers:

We can use str.extract for a regex option:

df["date"] = df["Column_with_date"].str.extract(r'[.*?]_(d{2}/d{2}/d{4})')

If additionally you want the date column to be a bona fide datetime column rather than text, then use pd.to_datetime():

df["date"] = pd.to_datetime(df["date"], format='%d/%m/%Y')
Answered By: Tim Biegeleisen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.