When does class attribute initialization code run in python?

Question:

There is a class attribute spark in our AnalyticsWriter class:

class AnalyticsWriter:

    spark = SparkSession.getActiveSession()  # this is not getting executed

I noticed that this code is not being executed before a certain class method is run. Note: it has been verified that there is already an active SparkSession available in the process: so the init code is simply not being executed

    @classmethod
    def measure_upsert(
        cls
    ) -> DeltaTable:

        assert AnalyticsWriter.spark, "AnalyticsWriter requires 
             an active SparkSession"

I come from jvm-land (java/scala) and in those places the class level initialization code happens before any method invocations. What is the equivalent in python?

Asked By: WestCoastProjects

||

Answers:

Class attributes are initialized at the moment they are hit, during class definition, so the line containing the getActiveSession() call is run before the class is even fully defined.

class AnalyticsWriter:
    spark = SparkSession.getActiveSession()
    # The code has been run here
    
    # ... other definitions that occur after spark exists ...
# class is complete here

I suspect the code is doing something, just not what you expect. You can confirm that it is in fact run with a cheesy hack like:

class AnalyticsWriter:
    spark = (SparkSession.getActiveSession(), print("getActiveSession called", flush=True))[0]

which just makes a tuple of the result of your call and an eager print, then discards the meaningless result from the print; you should see the output from the print immediately, before you can get around to calling class methods.

Answered By: ShadowRanger
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.