Cleaning date column in python with multiple date formats

Question:

I am cleaning up a dataframe that has date of birth and date of death as a string. There are multiple formats of dates in those columns. Some contain just year (which is all I need). These are the formats of dates:

Jan 10 2020
1913 
10/8/2019
June 14th 1980

All I need is the year from each date. I have not been having any luck with pandas to_datetime since a significant portion of the rows only have year to begin with.

Is there a way for me to pull just year from the strings so that I can get each column to look like:

2020
1913
2019
1980
Asked By: jenna

||

Answers:

You can use str.extract:

df['BirthDate'] = df['BirthDate'].str.extract(r'd{4}')
Answered By: Deneb

The simplest way is to use a parser which will accept these and other formats:

import pandas as pd
from dateutil import parser 

df  = pd.DataFrame({"mydates":["Jan 10 2020", "1913", "10/8/2019", "June 14th 1980"]})
df['years'] = df['mydates'].apply(parser.parse).dt.strftime('%Y')
print(df)
Answered By: user19077881
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.