Is there a way to run all Jupyter Notebooks inside a directory?
Question:
Introduction
I have a lot of Jupyter Notebooks inside a directory and I want to run them all to see them output.
What I actually do
I have to open them each one, click on "Restart kernel and re-run the whole notebook?", wait a few minutes and then go for the next one.
What I wish to do
Find a way to just "press a button" (it can be a script, command, or everything), go away for a walk, and come back reading what’s the output.
Thanks in advance!
Answers:
You can convert every single notebook that you want to run in a .py
file and then create a single notebook that imports them as modules. Something like this:
script1.py:
print('This is the first script.')
script2.py:
print('This is the second script.')
script3.py:
print('...and this is the last one!')
Now you import them all in a single script (you can create it in Jupyter):
import script1
import script2
import script3
# This is the first script.
# This is the second script.
# ...and this is the last one!
You can achieve this with nbconvert
or papermill
.
See also this answer.
This is an example in papermill
:
Installation with Anaconda:
conda install -c conda-forge papermill
Create a new notebook that runs all the notebooks in a specific directory:
import papermill as pm
from pathlib import Path
for nb in Path('./run_all').glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb # Path to save executed notebook
)
papermill and nbclient have an overhead since they create a new process to execute the code; plus, they cannot execute notebooks in parallel.
I ran some benchmarks, and I’m showing a few options from the fastest to the slowest one (I used these notebooks for benchmarking), and I used the time
command to time the execution.
Fastest: Ploomber in parallel
25.440 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel
from pathlib import Path
from glob import iglob
dag = DAG(executor=Parallel())
for path in iglob('*.ipynb'):
NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))
if __name__ == '__main__':
dag.build(force=True)
This requires:
pip install ploomber
Papermill using ploomber-engine
51.256 total
import papermill as pm
from glob import glob
for nb in glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb,
engine_name='embedded',
)
This requires:
pip install ploomber-engine
Ploomber (serial)
59.324 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel
from pathlib import Path
from glob import iglob
dag = DAG()
for path in iglob('*.ipynb'):
NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))
if __name__ == '__main__':
dag.build(force=True)
This requires:
pip install ploomber
Slowest: papermill
1:58.79 total
import papermill as pm
from glob import glob
for nb in glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb,
)
This requires:
pip install papermill
Note: I did not evaluate nbclient
since the performance is similar to papermill
.
Introduction
I have a lot of Jupyter Notebooks inside a directory and I want to run them all to see them output.
What I actually do
I have to open them each one, click on "Restart kernel and re-run the whole notebook?", wait a few minutes and then go for the next one.
What I wish to do
Find a way to just "press a button" (it can be a script, command, or everything), go away for a walk, and come back reading what’s the output.
Thanks in advance!
You can convert every single notebook that you want to run in a .py
file and then create a single notebook that imports them as modules. Something like this:
script1.py:
print('This is the first script.')
script2.py:
print('This is the second script.')
script3.py:
print('...and this is the last one!')
Now you import them all in a single script (you can create it in Jupyter):
import script1
import script2
import script3
# This is the first script.
# This is the second script.
# ...and this is the last one!
You can achieve this with nbconvert
or papermill
.
See also this answer.
This is an example in papermill
:
Installation with Anaconda:
conda install -c conda-forge papermill
Create a new notebook that runs all the notebooks in a specific directory:
import papermill as pm
from pathlib import Path
for nb in Path('./run_all').glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb # Path to save executed notebook
)
papermill and nbclient have an overhead since they create a new process to execute the code; plus, they cannot execute notebooks in parallel.
I ran some benchmarks, and I’m showing a few options from the fastest to the slowest one (I used these notebooks for benchmarking), and I used the time
command to time the execution.
Fastest: Ploomber in parallel
25.440 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel
from pathlib import Path
from glob import iglob
dag = DAG(executor=Parallel())
for path in iglob('*.ipynb'):
NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))
if __name__ == '__main__':
dag.build(force=True)
This requires:
pip install ploomber
Papermill using ploomber-engine
51.256 total
import papermill as pm
from glob import glob
for nb in glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb,
engine_name='embedded',
)
This requires:
pip install ploomber-engine
Ploomber (serial)
59.324 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel
from pathlib import Path
from glob import iglob
dag = DAG()
for path in iglob('*.ipynb'):
NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))
if __name__ == '__main__':
dag.build(force=True)
This requires:
pip install ploomber
Slowest: papermill
1:58.79 total
import papermill as pm
from glob import glob
for nb in glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb,
)
This requires:
pip install papermill
Note: I did not evaluate nbclient
since the performance is similar to papermill
.