How to replace value in a column based on maximum value in same column in Pyspark?

Question:

I have a column named version with integer values 1,2,….upto 8. I want to replace all the integer values with the maximum number present in the same column version, In this case its 8, So I want to replace 1,2,3,4,5,6,7 with 8. I tried couple of methods but couldn’t get the solution.

testDF = spark.createDataFrame([(1,"a"), (2,"b"), (3,"c"), (4,"d"), (5,"e"), (6,"f"), (7,"g"), (8,"h")], ["version", "name"])
testDF.show()
+-------+----+
|version|name|
+-------+----+
|      1|   a|
|      2|   b|
|      3|   c|
|      4|   d|
|      5|   e|
|      6|   f|
|      7|   g|
|      8|   h|
+-------+----+

expected

+-------+----+
|version|name|
+-------+----+
|      8|   a|
|      8|   b|
|      8|   c|
|      8|   d|
|      8|   e|
|      8|   f|
|      8|   g|
|      8|   h|
+-------+----+
Asked By: Rahul Diggi

||

Answers:

try this,

testDF=testDF.withColumn("version", lit(testDF.agg({"version": "max"}).collect()[0][0]))

Output:

+-------+----+
|version|name|
+-------+----+
|      8|   a|
|      8|   b|
|      8|   c|
|      8|   d|
|      8|   e|
|      8|   f|
|      8|   g|
|      8|   h|
+-------+----+

Increment value like below:

testDF.withColumn("version", lit(testDF.agg({"version": "max"}).collect()[0][0]+1))

Output:

+-------+----+
|version|name|
+-------+----+
|      9|   a|
|      9|   b|
|      9|   c|
|      9|   d|
|      9|   e|
|      9|   f|
|      9|   g|
|      9|   h|
+-------+----+
Answered By: Mohamed Thasin ah