Filter out specific errors from Flake8 results

Question:

We are writing notebooks in databricks. When we put them to git we want to run flake8 on them to check for new problems in the code.

As databricks has some predefined variables those are undefined in the code itself.
Is it possible to filter our errors like:

F821 undefined name 'dbutils'

While keeping errors like

F821 undefined name 'my_var'

I am aware of the --ignore parameter, but as far as I understand this would only allow to exclude F821 in general and not for a specific variable name.

Thanks

Asked By: jugi

||

Answers:

You can specify an additional list of builtins by using the builtins parameter / configuration:

$ cat t2.py 
db_utils.wat()
my_var.wat()
$ flake8 t2.py 
t2.py:1:1: F821 undefined name 'db_utils'
t2.py:2:1: F821 undefined name 'my_var'
$ flake8 t2.py  --builtins db_utils
t2.py:2:1: F821 undefined name 'my_var'
Answered By: anthony sottile

Add the following at the beginning of the notebook

from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils

spark = SparkSession.getActiveSession()
dbutils = DBUtils(spark)

Optionally, install databricks-connect instead of pyspark in your local environment so that pyspark.dbutils is known. (Flake8 does not check, but other tools like VS Code’s Pylance do.)

This also gives code completion in external editors (VS Code in the following screenshot):

VS Code code completion

Answered By: Tom
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.