Flink python job execution fails

Question:

I have a flink cluster with 3 nodes set up. In the web interface I see 3 Task Managers, 3 Task Slots and 3 Available Task Slots.

I’m trying to run a simple word count and it fails very deterministically two out of three times. I assume that it fails every time it’s not running on the master.

Here are my flink-conf.yaml and word_count.py files.

The exception I’m getting is:

Caused by: java.lang.RuntimeException: Plan file caused an error. Check log-files for details.python: can't open file '/data/tmp/flink/flink-dist-cache-9fc4a122-1f21-4930-a998-db31129b4596/a68369119ce030c8ca4a0b98aeb39387/flink_dc/plan.py': [Errno 2] No such file or directory

(Full execution with stack trace is here.)

I checked all the folders and they all have a rwx permission.

Does anyone have an idea what am I doing wrong?

Asked By: Viktor Kerkez

||

Answers:

You have to set the python.dc.tmp.dir parameter to point to some file-system location that is accessible by all nodes (like hdfs).

You can find all configuration options for the Python API here: https://github.com/apache/flink/blob/master/flink-libraries/flink-python/src/main/java/org/apache/flink/python/api/PythonOptions.java

Answered By: Chesnay Schepler

I have found a wordcount simpe flink.

He just run $ ./bin/flink run examples/streaming/WordCount.jar
for mor detail .jar content can be found on
https://bckinfo.com/applications/how-to-install-apache-flink-on-centos-8/

i hope it will be helpful.

Answered By: user15192218
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.