Transform spark data frame

Question

I have a data frame in spark with the following format.

+----------+---------+                                                              
|Column 1  |  Values |
+----------+---------:+
|    A     | value1  |
|    B     | value2  |
|    C     | value2  |
|    A     | value1  |
|    B     | value3  |
|    C     | value1  |
|    A     | value1  |
|    B     | value1  |
|    C     | value2  |
+----------+---------+

I would transform it to the following by counting the number of occurs for each value:

+----------+---------+----------+---------+                                                              
|Column 1  |  value1 | value2   |  value2 |
+----------+---------+----------+---------+
|    A     |      3  |    0     |   0     |
|    B     |      1  |    1     |   1     |
|    C     |      1  |    2     |   0     |
+----------+---------+----------+---------+

Asked By: Error-F

||

Source

Answer 1

You can use pivot method as follows:

df = spark.createDataFrame([("a", "value1"), ("b", "value2"), ("c", "value2"), ("a", "value1"), ("b", "value3"), ("c", "value1"), ("a", "value1"),("b", "value1"),("c", "value2")],['col1', 'col2'])
df.show()

pivotDF = df.groupBy("col1").pivot("col2").count().na.fill(0)
pivotDF.show()

Here is the output I get for the code above with spark 2.3:

+----+------+------+------+
|col1|value1|value2|value3|
+----+------+------+------+
|   c|     1|     2|     0|
|   b|     1|     1|     1|
|   a|     3|     0|     0|
+----+------+------+------+

Answered By: ozlemg

Transform spark data frame

Question:

Answers: