How can I access data from a nested dynamic frame to properly format it in Pyspark?
Question:
I’ve uploaded some semi-structed data into AWS glue using a Dynamic frame. From the dynamic frame I just the payload element which I selected by executing the following code in a Glue notebook
df_p = df.select_fields(["payload"])
I’m trying to convert it to a spark dataframe by executing the following:
Spark_df = df_p.toDF()
Instead of providing me with a column for each element, I have one column that’s titled payload. How can I un-nest the data so I can have x amount of columns where the key is the column name and the value is a row in the dataframe?
Answers:
What you are looking for it’s called the explode function. It will unnest one layer.
In your case, you would apply it to the spark DF as follows:
from pyspark.sql.functions import explode
df_p = df.select_fields(["payload"])
spark_df = df_p.toDF()
exploded_df = spark_df.select(explode("payload"))
You might need to apply explode again if the content is nested several times, but that is the way to go. Let me know if it helps.
I’ve uploaded some semi-structed data into AWS glue using a Dynamic frame. From the dynamic frame I just the payload element which I selected by executing the following code in a Glue notebook
df_p = df.select_fields(["payload"])
I’m trying to convert it to a spark dataframe by executing the following:
Spark_df = df_p.toDF()
Instead of providing me with a column for each element, I have one column that’s titled payload. How can I un-nest the data so I can have x amount of columns where the key is the column name and the value is a row in the dataframe?
What you are looking for it’s called the explode function. It will unnest one layer.
In your case, you would apply it to the spark DF as follows:
from pyspark.sql.functions import explode
df_p = df.select_fields(["payload"])
spark_df = df_p.toDF()
exploded_df = spark_df.select(explode("payload"))
You might need to apply explode again if the content is nested several times, but that is the way to go. Let me know if it helps.