explode a pyspark column with root name intact

Question

I have pyspark dataframe , schema looks like this:

|-- col1: timestamp (nullable = true)
 |-- col2: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- NM: string (nullable = true)

How can I explode col2 so that final column name looks like col1, col2.NM etc

Asked By: ista120

||

Source

Answer 1

Update:

Since you have multiple such columns, you can create list of those columns and use the below:

cols_to_explode = ["col2","col3"]
other_cols = [F.col(c) for c in df.schema.names if c not in cols_to_explode]
struct_cols = list(chain(*[[F.col(col + "."+ c).alias(col+"_" + c) for c in df.withColumn(col, F.explode(col)).selectExpr(col+".*").columns] for col in df.schema.names if col in cols_to_explode]))

df 
.withColumn("asZipped", F.arrays_zip(*cols_to_explode))
.withColumn("asZipped", F.explode("asZipped"))
.select(other_cols+ [F.col("asZipped."+col).alias(col) for col in df.schema.names if col in cols_to_explode])
.select(other_cols+struct_cols)
.show(truncate=False)

Input:

Output:

This would work

df 
.withColumn("col2", F.explode("col2"))
.select([F.col(c) for c in df.schema.names if c!="col2"]+[F.col("col2." + c).alias("col2_" + c) for c in df.withColumn("col2", F.explode("col2")).selectExpr("col2.*").columns])
.show()

Input DF:

Output:

Answered By: Ronak Jain

explode a pyspark column with root name intact

Question:

Answers: