How to query for the maximum / highest value in an field with PySpark

Question:

The following dataframe will produce values 0 to 3.

df = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1').history().select(col("version"))

enter image description here

Can someone show me how to modify the dataframe such that it only provides the maximum value i.e 3?

I have tried

df.select("*").max("version")

And

df.max("version")

But no luck

Any thoughts?

Asked By: Patterson

||

Answers:

Use Max function, This should work:

df.select(F.max("version").alias("max_version")).show()

or

df.agg(F.max("version").alias("max_version")).show()

Input:

+-------+
|version|
+-------+
|      0|
|      1|
|      3|
|      2|
+-------+

Output:

+-----------+
|max_version|
+-----------+
|          3|
+-----------+
Answered By: Ronak Jain