emr

Amazon EMR: Pyspark having strange dependency issues

Amazon EMR: Pyspark having strange dependency issues Question: I have been having issues with getting a pyspark job to run on an EMR cluster, so I logged into the master node and ran spark-submit directly there I have a python file that I submit to pyspark and in this file I have: import subprocess from …

Total answers: 2

How to bootstrap installation of Python modules on Amazon EMR?

How to bootstrap installation of Python modules on Amazon EMR? Question: I want to do something really basic, simply fire up a Spark cluster through the EMR console and run a Spark script that depends on a Python package (for example, Arrow). What is the most straightforward way of doing this? Asked By: Evan Zamir …

Total answers: 5