Minutes to Hours on datetime column Pyspark

Question:

I have a pyspark dataframe with a column datetime containing : 2022-06-01 13:59:58
I would like to transform that datetime value into : 2022-06-01 14:00:58

Is there a way to round the minutes into hours , when the minutes are 59 min?

Asked By: Sara Stone

||

Answers:

You can accomplish this using expr or unix_timestamp and further adding 1 minute respectively , based on the minute against your timestamp value using when-otherwise

Unix Timestamps can get a bit fiddly as it involves an additional step of converting it to epoch, but either ways the end result is the same across both

Data Preparation

s = StringIO("""
date_str
2022-03-01 13:59:50
2022-05-20 13:45:50
2022-06-21 16:59:50
2022-10-22 20:59:50
""")

df = pd.read_csv(s,delimiter=',')

sparkDF = sql.createDataFrame(df)
             .withColumn('date_parsed',F.to_timestamp(F.col('date_str'), 'yyyy-MM-dd HH:mm:ss'))
             .drop('date_str')

sparkDF.show()

+-------------------+
|        date_parsed|
+-------------------+
|2022-03-01 13:59:50|
|2022-05-20 13:45:50|
|2022-06-21 16:59:50|
|2022-10-22 20:59:50|
+-------------------+

Extracting Minute & Addition

sparkDF = sparkDF.withColumn("date_minute", F.minute("date_parsed"))

sparkDF = sparkDF.withColumn('date_parsed_updated_expr',
                    F.when(F.col('date_minute') == 59,F.col('date_parsed') + F.expr('INTERVAL 1 MINUTE'))
                     .otherwise(F.col('date_parsed'))
                ).withColumn('date_parsed_updated_unix',
                    F.when(F.col('date_minute') == 59,(F.unix_timestamp(F.col('date_parsed')) + 60).cast('timestamp'))
                     .otherwise(F.col('date_parsed'))
                )
                             
sparkDF.show()

+-------------------+-----------+------------------------+------------------------+
|        date_parsed|date_minute|date_parsed_updated_expr|date_parsed_updated_unix|
+-------------------+-----------+------------------------+------------------------+
|2022-03-01 13:59:50|         59|     2022-03-01 14:00:50|     2022-03-01 14:00:50|
|2022-05-20 13:45:50|         45|     2022-05-20 13:45:50|     2022-05-20 13:45:50|
|2022-06-21 16:59:50|         59|     2022-06-21 17:00:50|     2022-06-21 17:00:50|
|2022-10-22 20:59:50|         59|     2022-10-22 21:00:50|     2022-10-22 21:00:50|
+-------------------+-----------+------------------------+------------------------+
Answered By: Vaebhav
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.