Apache Spark Python to Scala translation

Question:

If I got it right Apache YARN receives Application Master and Node Manager as JAR files. They executed as Java process on the nodes of the YARN cluster.
When I write a Spark program using Python, Does it compiled into JAR somehow?
If not how come Spark is able to execute Python logic on YARN cluster nodes?

Asked By: Rtik88

Source

Answers:

The PySpark driver program uses Py4J (http://py4j.sourceforge.net/) to launch a JVM and create a Spark Context. Spark RDD operations written in Python are mapped to operations on PythonRDD.

On the remote workers, PythonRDD launches sub-processes which run Python. The data and code is passed from the Remote Worker’s JVM to its Python sub-process using pipes.

Therefore, it is necessary for your YARN nodes to have python installed for this to work.

The python code is not compiled to a JAR, but is distributed around the cluster using Spark. In order to make this possible, user functions written in Python are pickled using the following code https://github.com/apache/spark/blob/master/python/pyspark/cloudpickle.py

Source: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

Answered By: mattinbits

maybe this website can helps you to resolve tout probem. So this link allow us to translate source code specialized in data ( couvert pyspark to Scala): https://www.deepcodetranslate.com/
Tell me if it’s ok.
Thanks

Answered By: Walid BEN GHEZALA