Minutes to Hours on datetime column Pyspark
Question:
I have a pyspark dataframe with a column datetime
containing : 2022-06-01 13:59:58
I would like to transform that datetime value into : 2022-06-01 14:00:58
Is there a way to round the minutes into hours , when the minutes are 59 min?
Answers:
You can accomplish this using expr
or unix_timestamp
and further adding 1 minute respectively , based on the minute against your timestamp value using when-otherwise
–
Unix Timestamps can get a bit fiddly as it involves an additional step of converting it to epoch, but either ways the end result is the same across both
Data Preparation
s = StringIO("""
date_str
2022-03-01 13:59:50
2022-05-20 13:45:50
2022-06-21 16:59:50
2022-10-22 20:59:50
""")
df = pd.read_csv(s,delimiter=',')
sparkDF = sql.createDataFrame(df)
.withColumn('date_parsed',F.to_timestamp(F.col('date_str'), 'yyyy-MM-dd HH:mm:ss'))
.drop('date_str')
sparkDF.show()
+-------------------+
| date_parsed|
+-------------------+
|2022-03-01 13:59:50|
|2022-05-20 13:45:50|
|2022-06-21 16:59:50|
|2022-10-22 20:59:50|
+-------------------+
Extracting Minute & Addition
sparkDF = sparkDF.withColumn("date_minute", F.minute("date_parsed"))
sparkDF = sparkDF.withColumn('date_parsed_updated_expr',
F.when(F.col('date_minute') == 59,F.col('date_parsed') + F.expr('INTERVAL 1 MINUTE'))
.otherwise(F.col('date_parsed'))
).withColumn('date_parsed_updated_unix',
F.when(F.col('date_minute') == 59,(F.unix_timestamp(F.col('date_parsed')) + 60).cast('timestamp'))
.otherwise(F.col('date_parsed'))
)
sparkDF.show()
+-------------------+-----------+------------------------+------------------------+
| date_parsed|date_minute|date_parsed_updated_expr|date_parsed_updated_unix|
+-------------------+-----------+------------------------+------------------------+
|2022-03-01 13:59:50| 59| 2022-03-01 14:00:50| 2022-03-01 14:00:50|
|2022-05-20 13:45:50| 45| 2022-05-20 13:45:50| 2022-05-20 13:45:50|
|2022-06-21 16:59:50| 59| 2022-06-21 17:00:50| 2022-06-21 17:00:50|
|2022-10-22 20:59:50| 59| 2022-10-22 21:00:50| 2022-10-22 21:00:50|
+-------------------+-----------+------------------------+------------------------+
I have a pyspark dataframe with a column datetime
containing : 2022-06-01 13:59:58
I would like to transform that datetime value into : 2022-06-01 14:00:58
Is there a way to round the minutes into hours , when the minutes are 59 min?
You can accomplish this using expr
or unix_timestamp
and further adding 1 minute respectively , based on the minute against your timestamp value using when-otherwise
–
Unix Timestamps can get a bit fiddly as it involves an additional step of converting it to epoch, but either ways the end result is the same across both
Data Preparation
s = StringIO("""
date_str
2022-03-01 13:59:50
2022-05-20 13:45:50
2022-06-21 16:59:50
2022-10-22 20:59:50
""")
df = pd.read_csv(s,delimiter=',')
sparkDF = sql.createDataFrame(df)
.withColumn('date_parsed',F.to_timestamp(F.col('date_str'), 'yyyy-MM-dd HH:mm:ss'))
.drop('date_str')
sparkDF.show()
+-------------------+
| date_parsed|
+-------------------+
|2022-03-01 13:59:50|
|2022-05-20 13:45:50|
|2022-06-21 16:59:50|
|2022-10-22 20:59:50|
+-------------------+
Extracting Minute & Addition
sparkDF = sparkDF.withColumn("date_minute", F.minute("date_parsed"))
sparkDF = sparkDF.withColumn('date_parsed_updated_expr',
F.when(F.col('date_minute') == 59,F.col('date_parsed') + F.expr('INTERVAL 1 MINUTE'))
.otherwise(F.col('date_parsed'))
).withColumn('date_parsed_updated_unix',
F.when(F.col('date_minute') == 59,(F.unix_timestamp(F.col('date_parsed')) + 60).cast('timestamp'))
.otherwise(F.col('date_parsed'))
)
sparkDF.show()
+-------------------+-----------+------------------------+------------------------+
| date_parsed|date_minute|date_parsed_updated_expr|date_parsed_updated_unix|
+-------------------+-----------+------------------------+------------------------+
|2022-03-01 13:59:50| 59| 2022-03-01 14:00:50| 2022-03-01 14:00:50|
|2022-05-20 13:45:50| 45| 2022-05-20 13:45:50| 2022-05-20 13:45:50|
|2022-06-21 16:59:50| 59| 2022-06-21 17:00:50| 2022-06-21 17:00:50|
|2022-10-22 20:59:50| 59| 2022-10-22 21:00:50| 2022-10-22 21:00:50|
+-------------------+-----------+------------------------+------------------------+