Check if columns exist and if not, create and fill with NaN using PySpark

Question:

I have a pyspark dataframe and a separate list of column names. I want to check and see if any of the list column names are missing, and if they are, I want to create them and fill with null values.

Is there a straightforward way to do this in pyspark? I can do it in Pandas, but it’s not what I need.

Asked By: A.N.

||

Answers:

This should work:

 if 'col' not in df.schema.names:
    df = df.withColumn('col', F.lit(None).cast(StringType())

Let me know if you face any issue.

Answered By: Ronak Jain
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.