How to install packages in Jupyter Notebook running in Docker Container

Question:

I’ve tried to set up PySpark on Windows 10. After some various challenges, I’ve decided to use Docker Image instead, and it worked great.

The hello world script is working. However, I’m not able to install any packages on Jupyter powered by Docker. Please advise.

Normally, I can use the code below on Anaconda terminal:

Issue:

The following command must be run outside the IPython shell:

    $ pip install fastavro

I cannot find how to install INSIDE docker. Please advise.

Resources:

  • Docker image – jupyter/pyspark-notebook
  • Operating System – Windows 10
Asked By: BI Dude

||

Answers:

In Jupyter cell/IPython shell, you can run:

!pip install PACKAGENAME 

To install package(s). Note the ‘!’ Prefix.

Update

When having multiple environment, in use the system executor(Python) used in that environment.

import sys

!{sys.executable} -m pip install PACKAGENAME
Answered By: Prayson W. Daniel

It would be reasonable to save an updated container, so you don’t need to install those packages each time. One way to do it is to build your own image. Let’s say you want to use the jupyter/datascience-notebook image from jupyter docker stack. First, you need to create the file Dockerfile (without extension). This file should contain the following instructions:

# Start from a core stack version
FROM jupyter/datascience-notebook:latest
# Install in the default python3 environment
RUN pip install --quiet --no-cache-dir 'flake8==3.9.2' && 
    fix-permissions "${CONDA_DIR}" && 
    fix-permissions "/home/${NB_USER}"

Instead of pip, you can use conda or mamba:

# install a package into the default (python 3.x) environment and cleanup after
# the installation
mamba install --quiet --yes some-package && 
    mamba clean --all -f -y && 
    fix-permissions "${CONDA_DIR}" && 
    fix-permissions "/home/${NB_USER}"

conda install --quiet --yes some-package && 
    conda clean --all -f -y && 
    fix-permissions "${CONDA_DIR}" && 
    fix-permissions "/home/${NB_USER}"

Then you need to go to the directory with your newly created Dockerfile and run:

$ docker image build --tag jupyter/base-notebook:my_version .

where --tag is the name of your image that has the following structure repository name:tag name. And don’t forget about the single dot . (path to Dockerfile) at the end!

When docker finished building the image, you can find it in the docker images list using docker image ls:

REPOSITORY                           TAG               IMAGE ID       CREATED             SIZE
jupyter/base-notebook                my_version        3cf0f4683b46   11 minutes ago      1.12GB

Now you can use your newly create image with installed packages:

$ docker run -p 8888:8888 jupyter/base-notebook:my_version

Another way to save a modified image is to use docker commit command. You can install desired packages directly in jupyter notebook and then save changes using:

$ docker commit CONTAINER_ID  jupyter/base-notebook:my_version

CONTAINER_ID you can find using docker ps command that lists running containers.

Answered By: Mykola Zotko