Calling referenced functions after mssparkutil.notebook.run?

Question:

How can I call functions defined in a different Synapse notebook after running the notebook with mssparkutils.notebook.run()?

example:

#parameters
value = "test"
from notebookutils import mssparkutils

mssparkutils.notebook.run("function definitions", 60, {"param": value})
df = load_cosmos_data() #defined in 'function definitions' notebook

This fails with: NameError: name 'load_cosmos_data' is not defined

I can use the functions with the %run command, but I need to be able to pass the parameter through to the function definitions notebook. %run doesn’t allow me to pass a variable as a parameter.

Asked By: blunderoverflow

||

Answers:

After going through this Official Microsoft Documentation,

When referencing other notebook, after the exit from the referenced
notebook with exit() or without that, the source notebook script will
be executed and they will become two different notebooks which have no
relationship between them. We can’t access any variable from that
notebook, and it applies to the functions of that notebook
as well.

In general programming languages as well, we can’t access the variables of a function which are local to it after its return. It is only possible when we return that variable.

Unfortunately, the exit() method doesn’t support returning values other than strings from the referenced notebook.

From the above code, assuming that you need to access the dataframe which is returning from the function load_cosmos_data() in referenced notebook. You can do it using the temporary views.

Please follow the demonstration below:

In the referenced notebook call the function and store the returned dataframe in a variable and create a temporary view for that. You can store this temporary view as dataframe in the source notebook.

Function Notebook:
Code:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
def load_data():
    data2 = [(24,"Rakesh","Govindula"),
        (16,"Virat","Kohli")]
    schema = StructType([ 
        StructField("id",IntegerType(),True), 
        StructField("firstname",StringType(),True), 
        StructField("lastname",StringType(),True)
    ])
    df = spark.createDataFrame(data=data2,schema=schema)
    return df

df2=load_data()
df2.show()    
df2.createOrReplaceTempView("dataframeview")
mssparkutils.notebook.exit("dataframeview")

enter image description here

Source Notebook:
Code:

value="test"
from notebookutils import mssparkutils
view_name=mssparkutils.notebook.run("/function_notebook", 60, {"param": value})   
df=spark.sql("select * from {0}".format(view_name))
df.show()

enter image description here

With this approach you can pass the parameter through to function notebook and can access the dataframe returned from the function as well.

Please go through this SO Thread if you face any issues when returning values from synapse notebook.

Answered By: Rakesh Govindula

You can’t use mssparkutils.notebook.run() if you want to access functions or variables in the notebook you are running. You have to use the magic command %run.

From Microsoft:

You can use %run magic command to reference another notebook within current notebook’s context. All the variables defined in the reference notebook are available in the current notebook.

No idea what magic this command is doing, but while both approaches can call a referenced notebook, only %run will return its context.

Answered By: Jeff Devine