Error when creating SparkSession in PySpark
Question:
When I am trying to create a sparksession I get this error:
spark = SparkSession.builder.appName("Practice").getOrCreate()
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
This is my code:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Practice").getOrCreate()
What am I doing wrong. I am actually following a tutorial online and the commands are exactly the same. However the tutorial is doing it in Jupyter notebooks and I am doing it in VS Code.
Traceback:
22/09/01 08:50:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "c:UsersBERNARD JOSHUAOneDriveDesktopSwinburne Computer SciencePySparkpySpark_test.py", line 4, in <module>
spark = SparkSession.builder.appName("Practice").getOrCreate()
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparksqlsession.py", line 269, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparkcontext.py", line 483, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparkcontext.py", line 197, in __init__
self._do_init(
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparkcontext.py", line 302, in _do_init
self._jvm.PythonUtils.getPythonAuthSocketTimeout(self._jsc)
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespy4jjava_gateway.py", line 1547, in __getattr__
raise Py4JError(
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
PS C:UsersBERNARD JOSHUAOneDriveDesktopSwinburne Computer SciencePySpark> SUCCESS: The process with PID 18428 (child process of PID 11272) has been terminated.
SUCCESS: The process with PID 11272 (child process of PID 16416) has been terminated.
SUCCESS: The process with PID 16416 (child process of PID 788) has been terminated.
Both my PySpark and Spark are the same versions.
Answers:
no attribute 'getorCreate'. Did you mean: 'getOrCreate'?
Try capitalising the "o".
Can you try any of the following solutions:
Solution 1
Install findspark
pip install findspark
In you code use:
import findspark
findspark.init()
Optionally you can also specify "/path/to/spark"
in the init method above:
findspark.init("/path/to/spark")
Solution 2:
As outlined @ pyspark error does not exist in the jvm error when initializing SparkContext, adding PYTHONPATH
environment variable (with value as:
%SPARK_HOME%python;%SPARK_HOME%pythonlibpy4j-<version>-src.zip:%PYTHONPATH%
,
- just check what
py4j
version you have in your spark/python/lib
folder) helped resolve this issue.
When I am trying to create a sparksession I get this error:
spark = SparkSession.builder.appName("Practice").getOrCreate()
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
This is my code:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Practice").getOrCreate()
What am I doing wrong. I am actually following a tutorial online and the commands are exactly the same. However the tutorial is doing it in Jupyter notebooks and I am doing it in VS Code.
Traceback:
22/09/01 08:50:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "c:UsersBERNARD JOSHUAOneDriveDesktopSwinburne Computer SciencePySparkpySpark_test.py", line 4, in <module>
spark = SparkSession.builder.appName("Practice").getOrCreate()
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparksqlsession.py", line 269, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparkcontext.py", line 483, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparkcontext.py", line 197, in __init__
self._do_init(
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespysparkcontext.py", line 302, in _do_init
self._jvm.PythonUtils.getPythonAuthSocketTimeout(self._jsc)
File "C:UsersBERNARD JOSHUAAppDataLocalProgramsPythonPython310libsite-packagespy4jjava_gateway.py", line 1547, in __getattr__
raise Py4JError(
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
PS C:UsersBERNARD JOSHUAOneDriveDesktopSwinburne Computer SciencePySpark> SUCCESS: The process with PID 18428 (child process of PID 11272) has been terminated.
SUCCESS: The process with PID 11272 (child process of PID 16416) has been terminated.
SUCCESS: The process with PID 16416 (child process of PID 788) has been terminated.
Both my PySpark and Spark are the same versions.
no attribute 'getorCreate'. Did you mean: 'getOrCreate'?
Try capitalising the "o".
Can you try any of the following solutions:
Solution 1
Install findspark
pip install findspark
In you code use:
import findspark
findspark.init()
Optionally you can also specify "/path/to/spark"
in the init method above:
findspark.init("/path/to/spark")
Solution 2:
As outlined @ pyspark error does not exist in the jvm error when initializing SparkContext, adding PYTHONPATH
environment variable (with value as:
%SPARK_HOME%python;%SPARK_HOME%pythonlibpy4j-<version>-src.zip:%PYTHONPATH%
,
- just check what
py4j
version you have in yourspark/python/lib
folder) helped resolve this issue.