Datetime object error while inserting datetime in pyspark dataframe

Question:

I am getting a error while inserting a datetime object into the pyspark data structure

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, TimestampType, ArrayType, BinaryType
filename = 'Rekorder_2022-08-24_14-12-42.mf4'
match = re.search(r'(d+-d+-d+_d+-d+-d+)',filename)
date_time = datetime.datetime.strptime(match.group(1),"%Y-%m-%d_%H-%M-%S")

Now, I am trying to insert the date into the following pyspark data structure

    sdf = spark.createDataFrame(date_time, schema = recorder_files)



    recorder_files = StructType(   [
        StructField('date_time', TimestampType(), False)                   
])

Error Message

TypeError: StructType can not accept object datetime.datetime(2022, 8,
24, 13, 47, 47) in type <class ‘datetime.datetime’>

What I tried?

  1. I checked that the time variable is of type datetime

type(date_time)
Out[85]: datetime.datetime

  1. I could see that the output variable is also of datetime

print(recorder_files)
StructType([StructField(‘date_time’, TimestampType(), False)])

  1. I also read that as per the link suggested that we need to add "lit" function of pyspark
    I tried this as well but it didnt work

date_time = lit(datetime.datetime.strptime(match.group(1),"%Y-%m-%d_%H-%M-%S"))

Asked By: Arun

||

Answers:

In order to add this date_time with the existing DataFrame use withColumn()

from pyspark.sql.functions import lit

df.withColumn("date_time", lit(date_time)).show()

Note: lit() create a column of literal value.

Answered By: arudsekaberne
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.