Spark SQL – Pivot and concatenation

Question:

I am working with spark sql and have a requirement to pivot and concatenate the data.
My input data looks like

ID Quantity Location
1 10 US
2 20 UK
2 5 CA
2 20 US
3 15 US
3 20 CA
4 25 US
4 10 CA

My Output needs to be able to sum the quantity on ID and store the location with comma separated values.

ID Quantity Location
1 10 US
2 45 UK,CA,US
3 35 US,CA
4 35 US,CA

I tried various methods in spark sql and pyspark available here, but was not able to achieve my results

Asked By: Maximus

||

Answers:

Sql version below if data are in table, using collect_list with concat_ws

%sql
with table1 as (
  select 1 as id, 10 as quantity, 'US' as location union all 
  select 2, 20, 'UK' union all
  select 2, 5, 'CA' union all
  select 2, 20, 'US' union all
  select 3, 15, 'US' union all
  select 3, 20, 'CA' union all
  select 4, 25, 'US' union all
  select 4, 10, 'CA'
)
select 
 id, 
 sum(quantity) as quantity, 
 concat_ws(',', collect_list(location)) as location
from table1
group by id
order by id
id quantity location
1 10 US
2 45 CA,UK,US
3 35 US,CA
4 35 US,CA
Answered By: Isolated
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.