Using numpy from the host OS for a spark container

Question:

I want to use the Docker image with Apache Spark on Ubuntu 18.04.

The more popular image from the hub has Spark 1.6.
The second image has a more recent version Spark 2.2

No image has numpy installed. The basic examples for Spark MLlib main guide require it.

I’ve tried running Dockerfile for installing numpy unsuccessfully, adding this to the original Dockerfile for Spark 2.2 image:

RUN apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

How do you set the container to use the OS’s numpy installation? What is the procedure? Is this the correct direction at all?

Edit: OS is Ubuntu 18.04

Asked By: Bor

||

Answers:

Dockerfile:

FROM p7hb/docker-spark

RUN apt-get update && apt install -y python-numpy

Build command:

docker build -t my_image .

Run container:

docker run -it --rm my_image /bin/bash

Check numpy:

root@55ce4c59122c:~# python
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print(numpy.__version__)
1.12.1
Answered By: atline
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.