Same python code runs 20 times slower in Jupyter Notebook compared to command line
Question:
I have a python code which uses Hugging Face Transformers to run an NLP task on a PDF document. When I run this code in Jupyter Notebook, it takes more than 1.5 hours to complete. I then setup the same code to run via a locally hosted Streamlit web app. To my surprise, it ran in under 5 mins!
I believe I am comparing apples to apples because:
- I am analyzing the same PDF document in each case
- Since the Streamlit app is locally hosted, all computation is running on my laptop CPU. I am not using any Hugging Face virtual resources. The HF models are being downloaded to my computer.
- The Jupyter Notebook is also running locally on my computer
- The
.py
file is generated from the Jupyter Notebook using ‘streamlit-juypter’ which just takes the Python code in the notebook and adds a few Streamlit statements
So, essentially same code running on same data using same hardware.
The only differences I can think of which may explain this are:
- Streamlit is running a
.py
python file from the command line instead of a .ipynb
notebook
- Streamlit is running inside a virtual environment instead of my main Python installation
Has anyone ever experienced something like this? Can running the same python code from the command line result in 20x greater speed?
Edit: As suggested by @Wayne, I compared the output of pip list
between my main Python installation and the venv and found some differences. So I updated all the core packages being used by the NLP task to latest versions and now run-time is the same. Though I still don’t know which package was responsible and it probably doesn’t matter now.
Answers:
It is likely that this is due to a difference in the two environments.
You can run %pip list
in the notebook and the equivalent of pip list
in the other environment and compare.
One of them may be has flawed code or doesn’t work optimally with the other versions of the involved packages.
I have a python code which uses Hugging Face Transformers to run an NLP task on a PDF document. When I run this code in Jupyter Notebook, it takes more than 1.5 hours to complete. I then setup the same code to run via a locally hosted Streamlit web app. To my surprise, it ran in under 5 mins!
I believe I am comparing apples to apples because:
- I am analyzing the same PDF document in each case
- Since the Streamlit app is locally hosted, all computation is running on my laptop CPU. I am not using any Hugging Face virtual resources. The HF models are being downloaded to my computer.
- The Jupyter Notebook is also running locally on my computer
- The
.py
file is generated from the Jupyter Notebook using ‘streamlit-juypter’ which just takes the Python code in the notebook and adds a few Streamlit statements
So, essentially same code running on same data using same hardware.
The only differences I can think of which may explain this are:
- Streamlit is running a
.py
python file from the command line instead of a.ipynb
notebook - Streamlit is running inside a virtual environment instead of my main Python installation
Has anyone ever experienced something like this? Can running the same python code from the command line result in 20x greater speed?
Edit: As suggested by @Wayne, I compared the output of pip list
between my main Python installation and the venv and found some differences. So I updated all the core packages being used by the NLP task to latest versions and now run-time is the same. Though I still don’t know which package was responsible and it probably doesn’t matter now.
It is likely that this is due to a difference in the two environments.
You can run %pip list
in the notebook and the equivalent of pip list
in the other environment and compare.
One of them may be has flawed code or doesn’t work optimally with the other versions of the involved packages.