Execute a Jupyter notebook with papermill and output a unique filename
Question:
I’d like to use papermill
as part of a data science workflow to record experiments. The key idea is that the output notebook should be stored as a unique artifact — an immutable record of the experiment. As such, I want the output filename to be a unique filename, such as experiment_<hash>.ipynb
. How can I do this automatically at the linux CLI? From the papermill docs, it looks like I must specify the exact output filename like
papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml
whereas what I really want is something like
papermill local/input.ipynb s3://bkt/output_[UNIQUE HASH HERE].ipynb -f parameters.yaml
I want to do this within the papermill
call automatically. A manual way would be
$ echo cat input.ipynb | md5sum
22f69c25ee3a855b17fead21e702668a
$ papermill local/input.ipynb s3://bkt/output_22f69c25ee3a855b17fead21e702668a.ipynb -f parameters.yaml
but I don’t want to do it manually with cut and paste.
Answers:
You can use command substitution, for example like this:
papermill local/input.ipynb s3://bkt/output_`date +%s | sha256sum | base64 | head -c 32`.ipynb -f parameters.yaml
or newer way
papermill local/input.ipynb s3://bkt/output_$(date +%s | sha256sum | base64 | head -c 32).ipynb -f parameters.yaml
You can also write a Python script and generate the unique ID:
# run_experiment.py
import uuid
import papermill as pm
experiment_id = str(uuid.uuid4())
pm.execute_notebook('input.ipynb', f'{experiment_id}.ipynb')
Then run it:
python run_experiment.py
I’d like to use papermill
as part of a data science workflow to record experiments. The key idea is that the output notebook should be stored as a unique artifact — an immutable record of the experiment. As such, I want the output filename to be a unique filename, such as experiment_<hash>.ipynb
. How can I do this automatically at the linux CLI? From the papermill docs, it looks like I must specify the exact output filename like
papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml
whereas what I really want is something like
papermill local/input.ipynb s3://bkt/output_[UNIQUE HASH HERE].ipynb -f parameters.yaml
I want to do this within the papermill
call automatically. A manual way would be
$ echo cat input.ipynb | md5sum
22f69c25ee3a855b17fead21e702668a
$ papermill local/input.ipynb s3://bkt/output_22f69c25ee3a855b17fead21e702668a.ipynb -f parameters.yaml
but I don’t want to do it manually with cut and paste.
You can use command substitution, for example like this:
papermill local/input.ipynb s3://bkt/output_`date +%s | sha256sum | base64 | head -c 32`.ipynb -f parameters.yaml
or newer way
papermill local/input.ipynb s3://bkt/output_$(date +%s | sha256sum | base64 | head -c 32).ipynb -f parameters.yaml
You can also write a Python script and generate the unique ID:
# run_experiment.py
import uuid
import papermill as pm
experiment_id = str(uuid.uuid4())
pm.execute_notebook('input.ipynb', f'{experiment_id}.ipynb')
Then run it:
python run_experiment.py