Using a pre-trained ML model in Apache Flink

Question:

I am new to Flink and am trying to use a pre-trained classifier in Flink to detect Hate Speech on Twitter. I have an SVM classifier that I trained on Python, but I have no idea how to use it in the Flink code.

One of the posts here talks about Async operations, but it goes way over my head. I have also tried using PMML but am facing an issue that I have detailed in a separate question.

Are there other methods or simple examples that can help me resolve this doubt?

P.S I am using Flink in Java (not PyFlink).

Asked By: Vishnu Prasad

||

Answers:

You can check Stateful Functions which provides a connection between Python and Java.

I think the documentation is not clear enough, you can check this thread as well.

Answered By: Metehan Yıldırım

Implemented a solution to this problem by creating a REST API using Flask and setting up a POST method which calls the pre-trained model.

enter image description here

The server exposes the model to clients.

enter image description here

In the Flink end, I added a map function which acts as a client, sends the input as a JSON through the post method to my server, and receives the response, I.e. the prediction.

enter image description here

Worked splendidly!

Answered By: Vishnu Prasad

If you prefer the micro-service approach, you can implement it similarly to the Flask example above, but more efficiently by using Flink’s Async IO operator:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/

This way you’re not blocking your pipeline waiting for the HTTP call to return.

Answered By: Rafi Aroch