Python pyspark columns count
Question:
I have a dataset like this :
Zone 1
Zone 2
A
A
A
B
B
A
A
B
B
B
And I want this :
Category
Zone
count
A
Zone1
3
B
Zone1
2
A
Zone2
2
B
Zone2
3
I tried with a group by Zone1 & Zone2 but it dont returns the good result
If someone can help me
Thanks in advance
Answers:
Stack the dataframe then do a groupby
+ count
expr = "stack(2, 'Zone 1', `Zone 1`, 'Zone 2', `Zone 2`) as (zone, category)"
result = df.selectExpr(expr).groupBy('category', 'zone').count()
Result
+--------+------+-----+
|category| zone|count|
+--------+------+-----+
| A|Zone 1| 3|
| A|Zone 2| 2|
| B|Zone 2| 3|
| B|Zone 1| 2|
+--------+------+-----+
I have a dataset like this :
Zone 1 | Zone 2 |
---|---|
A | A |
A | B |
B | A |
A | B |
B | B |
And I want this :
Category | Zone | count |
---|---|---|
A | Zone1 | 3 |
B | Zone1 | 2 |
A | Zone2 | 2 |
B | Zone2 | 3 |
I tried with a group by Zone1 & Zone2 but it dont returns the good result
If someone can help me
Thanks in advance
Stack the dataframe then do a groupby
+ count
expr = "stack(2, 'Zone 1', `Zone 1`, 'Zone 2', `Zone 2`) as (zone, category)"
result = df.selectExpr(expr).groupBy('category', 'zone').count()
Result
+--------+------+-----+
|category| zone|count|
+--------+------+-----+
| A|Zone 1| 3|
| A|Zone 2| 2|
| B|Zone 2| 3|
| B|Zone 1| 2|
+--------+------+-----+