i want to sum date in a looping 13 times using pyspark

Question:

Please help me to solve this issue, as I am still new to Python/Pyspark.
I want to do a loop to do a date sum in multiples of 7 for 13 times in the same column.

I have a master table like this :

id date
1 2019-02-21 10:00:00
2 2019-02-27 09:50:33

etc

I have created a script but wrong, instead created a new column, even though I want to create a new column..

I want this kind of result :

id x date
1 7 2019-02-21
1 14 2019-02-28
1 21 2019-03-14
1 28 2019-03-21
1 35 2019-03-28
1 42 2019-04-04
1 49 2019-04-11
1 56 2019-04-18
1 63 2019-04-25
1 70 2019-05-02
1 77 2019-05-09
1 84 2019-05-16
1 91 2019-05-16

etc

Asked By: rezha nanda

||

Answers:

Use sequence function and then explode.

Example:

df = spark.createDataFrame([('1','2019-02-21 10:00:00'),('2','2019-02-27 09:50:33')],['id','date']).
  withColumn("seq", expr("""sequence(0,91,7)""")).
    withColumn("date", to_date(col("date")))
df = df.select("id","date",explode(col("seq")))
df.withColumn("date", expr("""date_add(date, col)""")).show(200, False)

#+---+----------+---+
#|id |date      |col|
#+---+----------+---+
#|1  |2019-02-21|0  |
#|1  |2019-02-28|7  |
#|1  |2019-03-07|14 |
#|1  |2019-03-14|21 |
#|1  |2019-03-21|28 |
#|1  |2019-03-28|35 |
#|1  |2019-04-04|42 |
#|1  |2019-04-11|49 |
#|1  |2019-04-18|56 |
#|1  |2019-04-25|63 |
#|1  |2019-05-02|70 |
#|1  |2019-05-09|77 |
#|1  |2019-05-16|84 |
#|1  |2019-05-23|91 |
#|2  |2019-02-27|0  |
#|2  |2019-03-06|7  |
#|2  |2019-03-13|14 |
#|2  |2019-03-20|21 |
#|2  |2019-03-27|28 |
#|2  |2019-04-03|35 |
#|2  |2019-04-10|42 |
#|2  |2019-04-17|49 |
#|2  |2019-04-24|56 |
#|2  |2019-05-01|63 |
#|2  |2019-05-08|70 |
#|2  |2019-05-15|77 |
#|2  |2019-05-22|84 |
#|2  |2019-05-29|91 |
#+---+----------+---+
Answered By: notNull