explode a pyspark column with root name intact
Question:
I have pyspark dataframe , schema looks like this:
|-- col1: timestamp (nullable = true)
|-- col2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- NM: string (nullable = true)
How can I explode col2 so that final column name looks like col1, col2.NM
etc
Answers:
Update:
Since you have multiple such columns, you can create list of those columns and use the below:
cols_to_explode = ["col2","col3"]
other_cols = [F.col(c) for c in df.schema.names if c not in cols_to_explode]
struct_cols = list(chain(*[[F.col(col + "."+ c).alias(col+"_" + c) for c in df.withColumn(col, F.explode(col)).selectExpr(col+".*").columns] for col in df.schema.names if col in cols_to_explode]))
df
.withColumn("asZipped", F.arrays_zip(*cols_to_explode))
.withColumn("asZipped", F.explode("asZipped"))
.select(other_cols+ [F.col("asZipped."+col).alias(col) for col in df.schema.names if col in cols_to_explode])
.select(other_cols+struct_cols)
.show(truncate=False)
Input:
Output:
This would work
df
.withColumn("col2", F.explode("col2"))
.select([F.col(c) for c in df.schema.names if c!="col2"]+[F.col("col2." + c).alias("col2_" + c) for c in df.withColumn("col2", F.explode("col2")).selectExpr("col2.*").columns])
.show()
Input DF:
Output:
I have pyspark dataframe , schema looks like this:
|-- col1: timestamp (nullable = true)
|-- col2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- NM: string (nullable = true)
How can I explode col2 so that final column name looks like col1, col2.NM
etc
Update:
Since you have multiple such columns, you can create list of those columns and use the below:
cols_to_explode = ["col2","col3"]
other_cols = [F.col(c) for c in df.schema.names if c not in cols_to_explode]
struct_cols = list(chain(*[[F.col(col + "."+ c).alias(col+"_" + c) for c in df.withColumn(col, F.explode(col)).selectExpr(col+".*").columns] for col in df.schema.names if col in cols_to_explode]))
df
.withColumn("asZipped", F.arrays_zip(*cols_to_explode))
.withColumn("asZipped", F.explode("asZipped"))
.select(other_cols+ [F.col("asZipped."+col).alias(col) for col in df.schema.names if col in cols_to_explode])
.select(other_cols+struct_cols)
.show(truncate=False)
Input:
Output:
This would work
df
.withColumn("col2", F.explode("col2"))
.select([F.col(c) for c in df.schema.names if c!="col2"]+[F.col("col2." + c).alias("col2_" + c) for c in df.withColumn("col2", F.explode("col2")).selectExpr("col2.*").columns])
.show()
Input DF:
Output: