Same python code runs 20 times slower in Jupyter Notebook compared to command line

Question:

I have a python code which uses Hugging Face Transformers to run an NLP task on a PDF document. When I run this code in Jupyter Notebook, it takes more than 1.5 hours to complete. I then setup the same code to run via a locally hosted Streamlit web app. To my surprise, it ran in under 5 mins!

I believe I am comparing apples to apples because:

  • I am analyzing the same PDF document in each case
  • Since the Streamlit app is locally hosted, all computation is running on my laptop CPU. I am not using any Hugging Face virtual resources. The HF models are being downloaded to my computer.
  • The Jupyter Notebook is also running locally on my computer
  • The .py file is generated from the Jupyter Notebook using ‘streamlit-juypter’ which just takes the Python code in the notebook and adds a few Streamlit statements

So, essentially same code running on same data using same hardware.

The only differences I can think of which may explain this are:

  • Streamlit is running a .py python file from the command line instead of a .ipynb notebook
  • Streamlit is running inside a virtual environment instead of my main Python installation

Has anyone ever experienced something like this? Can running the same python code from the command line result in 20x greater speed?

Edit: As suggested by @Wayne, I compared the output of pip list between my main Python installation and the venv and found some differences. So I updated all the core packages being used by the NLP task to latest versions and now run-time is the same. Though I still don’t know which package was responsible and it probably doesn’t matter now.

MS Excel comparison of package versions
comparison of package versions

Asked By: Ambar Nag

||

Answers:

It is likely that this is due to a difference in the two environments.

You can run %pip list in the notebook and the equivalent of pip list in the other environment and compare.
One of them may be has flawed code or doesn’t work optimally with the other versions of the involved packages.

Answered By: Wayne
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.