How two create label column, based on index number (odd/even) on pySpark
Question:
Here’s my Input
index date_id year month day hour minute
0 156454 20200801 2021 12 31 12 38
1 156454 20200801 2021 12 31 12 39
What I want is just make label ‘poi1’ for odd rows and ‘poi2’ for even rows
Here’s my output
index date_id year month day hour minute label
0 156454 20200801 2021 12 31 12 38 poi1
1 156454 20200801 2021 12 31 12 39 poi2
The pandas code is like this
df_movmnt_2["label"] = np.where(((df_movmnt_2.index)+1)%2 != 0, "poi1", "poi2")
Answers:
Use when().otherwise()
df.withColumn('label', when((col('index')+1)%2==0,'poi1').otherwise('poi2')).show()
+-----+-------+--------+-----+---+----+------+---+-----+
|index|date_id| year|month|day|hour|minute| _8|label|
+-----+-------+--------+-----+---+----+------+---+-----+
| 0| 156454|20200801| 2021| 12| 31| 12| 38| poi2|
| 1| 156454|20200801| 2021| 12| 31| 12| 39| poi1|
+-----+-------+--------+-----+---+----+------+---+-----+
Here’s my Input
index date_id year month day hour minute
0 156454 20200801 2021 12 31 12 38
1 156454 20200801 2021 12 31 12 39
What I want is just make label ‘poi1’ for odd rows and ‘poi2’ for even rows
Here’s my output
index date_id year month day hour minute label
0 156454 20200801 2021 12 31 12 38 poi1
1 156454 20200801 2021 12 31 12 39 poi2
The pandas code is like this
df_movmnt_2["label"] = np.where(((df_movmnt_2.index)+1)%2 != 0, "poi1", "poi2")
Use when().otherwise()
df.withColumn('label', when((col('index')+1)%2==0,'poi1').otherwise('poi2')).show()
+-----+-------+--------+-----+---+----+------+---+-----+
|index|date_id| year|month|day|hour|minute| _8|label|
+-----+-------+--------+-----+---+----+------+---+-----+
| 0| 156454|20200801| 2021| 12| 31| 12| 38| poi2|
| 1| 156454|20200801| 2021| 12| 31| 12| 39| poi1|
+-----+-------+--------+-----+---+----+------+---+-----+