dealing with dates in pandas

Question:

I have columns in DataFrame which consist mainly of dates.
But there may also be strings or empty values.
I want to extract the year from the column but get an error because of string values.
Is there a way to apply universal solutions to that? Not only to get a year or month but also to apply other functions which could end with this error.
I mean, I would like to understand the nature of this problem and how to deal with it.

code is like

dates={'date':['11/03/2019','12/05/2021','','11/03/2021','x'],
      'date2':['11/04/2019','12/03/2021','11/06/2021',np.nan,'ab'],
}
df2=pd.DataFrame(dates)
df2['year'] =pd.DatetimeIndex(df2['date']).year
the error messages

Unknown string format: x

Thank you in advance!

Asked By: Zaur Guliyev

||

Answers:

You can try this,

dates={'date':['11/03/2019','12/05/2021','','11/03/2021','x'],
      'date2':['11/04/2019','12/03/2021','11/06/2021',np.nan,'ab'],
}
df =pd.DataFrame(dates)
df["date"] = pd.to_datetime(df['date'], errors = "coerce")
df["date2"] = pd.to_datetime(df['date2'], errors = "coerce")

df["year1"] = df["date"].dt.year
df["year2"] = df["date2"].dt.year

Output –

date date2 year1 year2
0 2019-11-03 00:00:00 2019-11-04 00:00:00 2019.0 2019.0
1 2021-12-05 00:00:00 2021-12-03 00:00:00 2021.0 2021.0
2 NaT 2021-11-06 00:00:00 nan 2021.0
3 2021-11-03 00:00:00 NaT 2021.0 nan
4 NaT NaT nan nan

If you don’t want any null values in your dataframe, do df.dropna(inplace = True) before adding the year1 and year2 columns.

Answered By: Zero

Try with the following solution:

df2 = pd.DataFrame(dates)
df2['year'] = [e[6:] if le(e) == 10 else None for e in df2['date']]
df2

Output:

enter image description

Note: the notation ‘le’ in the code corresponds to ‘len’.

Answered By: Miykayah10
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.