i want to sum date in a looping 13 times using pyspark

Question

Please help me to solve this issue, as I am still new to Python/Pyspark.
I want to do a loop to do a date sum in multiples of 7 for 13 times in the same column.

I have a master table like this :

id	date
1	2019-02-21 10:00:00
2	2019-02-27 09:50:33

etc

I have created a script but wrong, instead created a new column, even though I want to create a new column..

I want this kind of result :

id	x	date
1	7	2019-02-21
1	14	2019-02-28
1	21	2019-03-14
1	28	2019-03-21
1	35	2019-03-28
1	42	2019-04-04
1	49	2019-04-11
1	56	2019-04-18
1	63	2019-04-25
1	70	2019-05-02
1	77	2019-05-09
1	84	2019-05-16
1	91	2019-05-16

etc

Asked By: rezha nanda

||

Source

Answer 1

Use sequence function and then explode.

Example:

df = spark.createDataFrame([('1','2019-02-21 10:00:00'),('2','2019-02-27 09:50:33')],['id','date']).
  withColumn("seq", expr("""sequence(0,91,7)""")).
    withColumn("date", to_date(col("date")))
df = df.select("id","date",explode(col("seq")))
df.withColumn("date", expr("""date_add(date, col)""")).show(200, False)

#+---+----------+---+
#|id |date      |col|
#+---+----------+---+
#|1  |2019-02-21|0  |
#|1  |2019-02-28|7  |
#|1  |2019-03-07|14 |
#|1  |2019-03-14|21 |
#|1  |2019-03-21|28 |
#|1  |2019-03-28|35 |
#|1  |2019-04-04|42 |
#|1  |2019-04-11|49 |
#|1  |2019-04-18|56 |
#|1  |2019-04-25|63 |
#|1  |2019-05-02|70 |
#|1  |2019-05-09|77 |
#|1  |2019-05-16|84 |
#|1  |2019-05-23|91 |
#|2  |2019-02-27|0  |
#|2  |2019-03-06|7  |
#|2  |2019-03-13|14 |
#|2  |2019-03-20|21 |
#|2  |2019-03-27|28 |
#|2  |2019-04-03|35 |
#|2  |2019-04-10|42 |
#|2  |2019-04-17|49 |
#|2  |2019-04-24|56 |
#|2  |2019-05-01|63 |
#|2  |2019-05-08|70 |
#|2  |2019-05-15|77 |
#|2  |2019-05-22|84 |
#|2  |2019-05-29|91 |
#+---+----------+---+

Answered By: notNull

i want to sum date in a looping 13 times using pyspark

Question:

Answers: