PySpark: Converting Date String to Datetime Value in 1.6

Question:

I working with Spark 1.6 and I want to convert a String Value to a Datetime Value. However, none of the suggestions I read here so far worked, it has probably to do with my old version and the weird datetime representation in my string.

It looks like this:

df = spark.createDataFrame([('2011-11-17-12.00.46.841219',)], ['time'])

df.show(truncate = False)

How do I get a nice yyyy-MM-dd HH:mm:ss datetime format? I tried this, but it did not work.

Best regards

Asked By: Niels

||

Answers:

You can use to_timestamp() to convert a string to a timestamp with a custom format.

from pyspark.sql.functions import col, to_timestamp

df.withColumn(to_timestamp(col("time), "yyyy-MM-dd-HH.mm.ss.SSSSSS")).show()
Answered By: Robert Kossendey

you can use from_unixtime(unix_timestamp(<ts_column>, <ts_format>)), but I believe you’ll lose the fraction of seconds in the conversion.

spark.createDataFrame([('2011-11-17-12.00.46.841219',)], ['ts_str']). 
    withColumn('ts', 
               func.from_unixtime(func.unix_timestamp('ts_str', 'yyyy-MM-dd-HH.mm.ss'))
               ). 
    show(truncate=False)

# +--------------------------+-------------------+
# |ts_str                    |ts                 |
# +--------------------------+-------------------+
# |2011-11-17-12.00.46.841219|2011-11-17 12:00:46|
# +--------------------------+-------------------+
Answered By: samkart