Transform spark data frame


I have a data frame in spark with the following format.

|Column 1  |  Values |
|    A     | value1  |
|    B     | value2  |
|    C     | value2  |
|    A     | value1  |
|    B     | value3  |
|    C     | value1  |
|    A     | value1  |
|    B     | value1  |
|    C     | value2  |

I would transform it to the following by counting the number of occurs for each value:

|Column 1  |  value1 | value2   |  value2 |
|    A     |      3  |    0     |   0     |
|    B     |      1  |    1     |   1     |
|    C     |      1  |    2     |   0     |
Asked By: Error-F



You can use pivot method as follows:

df = spark.createDataFrame([("a", "value1"), ("b", "value2"), ("c", "value2"), ("a", "value1"), ("b", "value3"), ("c", "value1"), ("a", "value1"),("b", "value1"),("c", "value2")],['col1', 'col2'])

pivotDF = df.groupBy("col1").pivot("col2").count().na.fill(0)

Here is the output I get for the code above with spark 2.3:

|   c|     1|     2|     0|
|   b|     1|     1|     1|
|   a|     3|     0|     0|
Answered By: ozlemg