Python pyspark columns count

Question:

I have a dataset like this :

Zone 1 Zone 2
A A
A B
B A
A B
B B

And I want this :

Category Zone count
A Zone1 3
B Zone1 2
A Zone2 2
B Zone2 3

I tried with a group by Zone1 & Zone2 but it dont returns the good result
If someone can help me

Thanks in advance

Asked By: Nabs335

||

Answers:

Stack the dataframe then do a groupby + count

expr = "stack(2, 'Zone 1', `Zone 1`, 'Zone 2', `Zone 2`) as (zone, category)"
result = df.selectExpr(expr).groupBy('category', 'zone').count()

Result

+--------+------+-----+
|category|  zone|count|
+--------+------+-----+
|       A|Zone 1|    3|
|       A|Zone 2|    2|
|       B|Zone 2|    3|
|       B|Zone 1|    2|
+--------+------+-----+
Answered By: Shubham Sharma
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.