How to sort by count with groupby in dataframe spark
Question:
Answers:
.show
is returning None
which you can’t chain any dataframe method after. Remove it and use orderBy
to sort the result dataframe:
from pyspark.sql.functions import hour, col
hour = checkin.groupBy(hour("date").alias("hour")).count().orderBy(col('count').desc())
Or:
from pyspark.sql.functions import hour, desc
checkin.groupBy(hour("date").alias("hour")).count().orderBy(desc('count')).show()
.show() returns None, so either assign the result to the variable hour than call hour.show() or don’t assign like @Psidom suggest.
BTW, @Psidom doesn’t work. The code below works with Spark 3.2 or above. I haven’t checked the earlier version.
checkin.groupBy(hour("date").alias("hour")).count().orderBy('count', ascending=False).show()
.show
is returning None
which you can’t chain any dataframe method after. Remove it and use orderBy
to sort the result dataframe:
from pyspark.sql.functions import hour, col
hour = checkin.groupBy(hour("date").alias("hour")).count().orderBy(col('count').desc())
Or:
from pyspark.sql.functions import hour, desc
checkin.groupBy(hour("date").alias("hour")).count().orderBy(desc('count')).show()
.show() returns None, so either assign the result to the variable hour than call hour.show() or don’t assign like @Psidom suggest.
BTW, @Psidom doesn’t work. The code below works with Spark 3.2 or above. I haven’t checked the earlier version.
checkin.groupBy(hour("date").alias("hour")).count().orderBy('count', ascending=False).show()