Converting string dd.mm.yyyy to date format yyyy-MM-dd using Pyspark

Question:

I have a column with date in string format: dd.mm.yyyy I want to convert it into date format yyyy-MM-dd using Pyspark, I have tried the following but it’s returning null values

df.withColumn("date_col", to_date("string_col", "yyyy-mmm-dd")
string_col date_col
02.11.2008 null
26.02.2021 null
Asked By: f.ivy

||

Answers:

You should match the format argument with the right format in your string.
E.g. df.withColumn("date_col", to_date("string_col", "dd.mm.yyyy")

Make sure the placement of the day, month and year is correct, as well as the seperator ‘.’ instead of ‘-‘.

See also the docs

Answered By: Tessa I