Cleaning date column in python with multiple date formats
Question:
I am cleaning up a dataframe that has date of birth and date of death as a string. There are multiple formats of dates in those columns. Some contain just year (which is all I need). These are the formats of dates:
Jan 10 2020
1913
10/8/2019
June 14th 1980
All I need is the year from each date. I have not been having any luck with pandas to_datetime since a significant portion of the rows only have year to begin with.
Is there a way for me to pull just year from the strings so that I can get each column to look like:
2020
1913
2019
1980
Answers:
You can use str.extract
:
df['BirthDate'] = df['BirthDate'].str.extract(r'd{4}')
The simplest way is to use a parser which will accept these and other formats:
import pandas as pd
from dateutil import parser
df = pd.DataFrame({"mydates":["Jan 10 2020", "1913", "10/8/2019", "June 14th 1980"]})
df['years'] = df['mydates'].apply(parser.parse).dt.strftime('%Y')
print(df)
I am cleaning up a dataframe that has date of birth and date of death as a string. There are multiple formats of dates in those columns. Some contain just year (which is all I need). These are the formats of dates:
Jan 10 2020
1913
10/8/2019
June 14th 1980
All I need is the year from each date. I have not been having any luck with pandas to_datetime since a significant portion of the rows only have year to begin with.
Is there a way for me to pull just year from the strings so that I can get each column to look like:
2020
1913
2019
1980
You can use str.extract
:
df['BirthDate'] = df['BirthDate'].str.extract(r'd{4}')
The simplest way is to use a parser which will accept these and other formats:
import pandas as pd
from dateutil import parser
df = pd.DataFrame({"mydates":["Jan 10 2020", "1913", "10/8/2019", "June 14th 1980"]})
df['years'] = df['mydates'].apply(parser.parse).dt.strftime('%Y')
print(df)