Flink python job execution fails
Question:
I have a flink cluster with 3 nodes set up. In the web interface I see 3 Task Managers, 3 Task Slots and 3 Available Task Slots.
I’m trying to run a simple word count and it fails very deterministically two out of three times. I assume that it fails every time it’s not running on the master.
Here are my flink-conf.yaml
and word_count.py
files.
The exception I’m getting is:
Caused by: java.lang.RuntimeException: Plan file caused an error. Check log-files for details.python: can't open file '/data/tmp/flink/flink-dist-cache-9fc4a122-1f21-4930-a998-db31129b4596/a68369119ce030c8ca4a0b98aeb39387/flink_dc/plan.py': [Errno 2] No such file or directory
(Full execution with stack trace is here.)
I checked all the folders and they all have a rwx
permission.
Does anyone have an idea what am I doing wrong?
Answers:
You have to set the python.dc.tmp.dir parameter to point to some file-system location that is accessible by all nodes (like hdfs).
You can find all configuration options for the Python API here: https://github.com/apache/flink/blob/master/flink-libraries/flink-python/src/main/java/org/apache/flink/python/api/PythonOptions.java
I have found a wordcount simpe flink.
He just run $ ./bin/flink run examples/streaming/WordCount.jar
for mor detail .jar content can be found on
https://bckinfo.com/applications/how-to-install-apache-flink-on-centos-8/
i hope it will be helpful.
I have a flink cluster with 3 nodes set up. In the web interface I see 3 Task Managers, 3 Task Slots and 3 Available Task Slots.
I’m trying to run a simple word count and it fails very deterministically two out of three times. I assume that it fails every time it’s not running on the master.
Here are my flink-conf.yaml
and word_count.py
files.
The exception I’m getting is:
Caused by: java.lang.RuntimeException: Plan file caused an error. Check log-files for details.python: can't open file '/data/tmp/flink/flink-dist-cache-9fc4a122-1f21-4930-a998-db31129b4596/a68369119ce030c8ca4a0b98aeb39387/flink_dc/plan.py': [Errno 2] No such file or directory
(Full execution with stack trace is here.)
I checked all the folders and they all have a rwx
permission.
Does anyone have an idea what am I doing wrong?
You have to set the python.dc.tmp.dir parameter to point to some file-system location that is accessible by all nodes (like hdfs).
You can find all configuration options for the Python API here: https://github.com/apache/flink/blob/master/flink-libraries/flink-python/src/main/java/org/apache/flink/python/api/PythonOptions.java
I have found a wordcount simpe flink.
He just run $ ./bin/flink run examples/streaming/WordCount.jar
for mor detail .jar content can be found on
https://bckinfo.com/applications/how-to-install-apache-flink-on-centos-8/
i hope it will be helpful.