Datetime object error while inserting datetime in pyspark dataframe
Question:
I am getting a error while inserting a datetime object into the pyspark data structure
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, TimestampType, ArrayType, BinaryType
filename = 'Rekorder_2022-08-24_14-12-42.mf4'
match = re.search(r'(d+-d+-d+_d+-d+-d+)',filename)
date_time = datetime.datetime.strptime(match.group(1),"%Y-%m-%d_%H-%M-%S")
Now, I am trying to insert the date into the following pyspark data structure
sdf = spark.createDataFrame(date_time, schema = recorder_files)
recorder_files = StructType( [
StructField('date_time', TimestampType(), False)
])
Error Message
TypeError: StructType can not accept object datetime.datetime(2022, 8,
24, 13, 47, 47) in type <class ‘datetime.datetime’>
What I tried?
- I checked that the time variable is of type datetime
type(date_time)
Out[85]: datetime.datetime
- I could see that the output variable is also of datetime
print(recorder_files)
StructType([StructField(‘date_time’, TimestampType(), False)])
- I also read that as per the link suggested that we need to add "lit" function of pyspark
I tried this as well but it didnt work
date_time = lit(datetime.datetime.strptime(match.group(1),"%Y-%m-%d_%H-%M-%S"))
Answers:
In order to add this date_time
with the existing DataFrame use withColumn()
from pyspark.sql.functions import lit
df.withColumn("date_time", lit(date_time)).show()
Note: lit()
create a column of literal value.
I am getting a error while inserting a datetime object into the pyspark data structure
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, TimestampType, ArrayType, BinaryType
filename = 'Rekorder_2022-08-24_14-12-42.mf4'
match = re.search(r'(d+-d+-d+_d+-d+-d+)',filename)
date_time = datetime.datetime.strptime(match.group(1),"%Y-%m-%d_%H-%M-%S")
Now, I am trying to insert the date into the following pyspark data structure
sdf = spark.createDataFrame(date_time, schema = recorder_files)
recorder_files = StructType( [
StructField('date_time', TimestampType(), False)
])
Error Message
TypeError: StructType can not accept object datetime.datetime(2022, 8,
24, 13, 47, 47) in type <class ‘datetime.datetime’>
What I tried?
- I checked that the time variable is of type datetime
type(date_time)
Out[85]: datetime.datetime
- I could see that the output variable is also of datetime
print(recorder_files)
StructType([StructField(‘date_time’, TimestampType(), False)])
- I also read that as per the link suggested that we need to add "lit" function of pyspark
I tried this as well but it didnt work
date_time = lit(datetime.datetime.strptime(match.group(1),"%Y-%m-%d_%H-%M-%S"))
In order to add this date_time
with the existing DataFrame use withColumn()
from pyspark.sql.functions import lit
df.withColumn("date_time", lit(date_time)).show()
Note: lit()
create a column of literal value.