Is there a way to run all Jupyter Notebooks inside a directory?

Question

Introduction

I have a lot of Jupyter Notebooks inside a directory and I want to run them all to see them output.

What I actually do

I have to open them each one, click on "Restart kernel and re-run the whole notebook?", wait a few minutes and then go for the next one.

What I wish to do

Find a way to just "press a button" (it can be a script, command, or everything), go away for a walk, and come back reading what’s the output.

Thanks in advance!

Asked By: Marte Valerio Falcone

||

Source

Answer 1

You can convert every single notebook that you want to run in a .py file and then create a single notebook that imports them as modules. Something like this:

script1.py:

print('This is the first script.')

script2.py:

print('This is the second script.')

script3.py:

print('...and this is the last one!')

Now you import them all in a single script (you can create it in Jupyter):

import script1
import script2
import script3
# This is the first script.
# This is the second script.
# ...and this is the last one!

Answered By: SilentCloud

Answer 2

You can achieve this with nbconvert or papermill.
See also this answer.

This is an example in papermill:

Installation with Anaconda:

conda install -c conda-forge papermill

Create a new notebook that runs all the notebooks in a specific directory:

import papermill as pm
from pathlib import Path

for nb in Path('./run_all').glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb  # Path to save executed notebook
    )

Answered By: JoS

Answer 3

papermill and nbclient have an overhead since they create a new process to execute the code; plus, they cannot execute notebooks in parallel.

I ran some benchmarks, and I’m showing a few options from the fastest to the slowest one (I used these notebooks for benchmarking), and I used the time command to time the execution.

Fastest: Ploomber in parallel

25.440 total

from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel

from pathlib import Path
from glob import iglob

dag = DAG(executor=Parallel())


for path in iglob('*.ipynb'):
    NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))


if __name__ ==  '__main__':
    dag.build(force=True)

This requires:

pip install ploomber

Papermill using ploomber-engine

51.256 total

import papermill as pm
from glob import glob

for nb in glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb,
        engine_name='embedded',
    )

This requires:

pip install ploomber-engine

Ploomber (serial)

59.324 total

from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel

from pathlib import Path
from glob import iglob

dag = DAG()


for path in iglob('*.ipynb'):
    NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))


if __name__ ==  '__main__':
    dag.build(force=True)

This requires:

pip install ploomber

Slowest: papermill

1:58.79 total

import papermill as pm
from glob import glob

for nb in glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb,
    )

This requires:

pip install papermill

Note: I did not evaluate nbclient since the performance is similar to papermill.

Answered By: Eduardo

Is there a way to run all Jupyter Notebooks inside a directory?

Question:

Introduction

What I actually do

What I wish to do

Answers:

Fastest: Ploomber in parallel

Papermill using ploomber-engine

Ploomber (serial)

Slowest: papermill