How to change the schema of the spark dataframe

Question:

I am reading a JSON file with spark.read.json and it automatically gives me the dataframe with schema but is it possible to change the schema of exisiting Dataframe with the below schema?

schema = StructType([StructField("_links", MapType(StringType(), MapType(StringType(), StringType()))),
                     StructField("identifier", StringType()),
                     StructField("enabled", BooleanType()),
                     StructField("family", StringType()),
                     StructField("categories", ArrayType(StringType())),
                     StructField("groups", ArrayType(StringType())),
                     StructField("parent", StringType()),
                     StructField("values", MapType(StringType(), ArrayType(MapType(StringType(), StringType())))),
                     StructField("created", StringType()),
                     StructField("updated", StringType()),
                     StructField("associations", MapType(StringType(), MapType(StringType(), ArrayType(StringType())))),
                     StructField("quantified_associations", MapType(StringType(), IntegerType())),
                     StructField("metadata", MapType(StringType(), StringType()))])
Asked By: Greencolor

||

Answers:

Once you have schema defined (as in the answer), you can use it to read the data this way:

df = spark.read.json('path_to_json', schema=schema)
Answered By: tomasborrella
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.