How to filter remove null values in spark python

Question:

I’m trying to filter out the null values in a column and count if its greater than 1.

badRows = df.filter($"_corrupt_record".isNotNull) if badRows.count > 0: logger.error("throwing bad rows exception...") schema_mismatch_exception(None, "cdc", item )

I’m getting a syntax error. Also tried to check using :

badRows = df.filter(col("_corrupt_record").isNotNull),
badRows = df.filter(None, col("_corrupt_record")),
badRows = df.filter("_corrupt_record isNotnull")

What is the correct way to filter out if there is data in the _corrupt_record column

Asked By: Vishal Sivala

||

Answers:

Try, e.g.

import pyspark.sql.functions as F
...
df.where(F.col("colname").isNotNull()) 
...

Many of the options you provide are not the right syntax as you note.

Answered By: thebluephantom