Using a pre-trained ML model in Apache Flink
Question:
I am new to Flink and am trying to use a pre-trained classifier in Flink to detect Hate Speech on Twitter. I have an SVM classifier that I trained on Python, but I have no idea how to use it in the Flink code.
One of the posts here talks about Async operations, but it goes way over my head. I have also tried using PMML but am facing an issue that I have detailed in a separate question.
Are there other methods or simple examples that can help me resolve this doubt?
P.S I am using Flink in Java (not PyFlink).
Answers:
You can check Stateful Functions which provides a connection between Python and Java.
I think the documentation is not clear enough, you can check this thread as well.
Implemented a solution to this problem by creating a REST API using Flask and setting up a POST method which calls the pre-trained model.
The server exposes the model to clients.
In the Flink end, I added a map function which acts as a client, sends the input as a JSON through the post method to my server, and receives the response, I.e. the prediction.
Worked splendidly!
If you prefer the micro-service approach, you can implement it similarly to the Flask example above, but more efficiently by using Flink’s Async IO operator:
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/
This way you’re not blocking your pipeline waiting for the HTTP call to return.
I am new to Flink and am trying to use a pre-trained classifier in Flink to detect Hate Speech on Twitter. I have an SVM classifier that I trained on Python, but I have no idea how to use it in the Flink code.
One of the posts here talks about Async operations, but it goes way over my head. I have also tried using PMML but am facing an issue that I have detailed in a separate question.
Are there other methods or simple examples that can help me resolve this doubt?
P.S I am using Flink in Java (not PyFlink).
You can check Stateful Functions which provides a connection between Python and Java.
I think the documentation is not clear enough, you can check this thread as well.
Implemented a solution to this problem by creating a REST API using Flask and setting up a POST method which calls the pre-trained model.
The server exposes the model to clients.
In the Flink end, I added a map function which acts as a client, sends the input as a JSON through the post method to my server, and receives the response, I.e. the prediction.
Worked splendidly!
If you prefer the micro-service approach, you can implement it similarly to the Flask example above, but more efficiently by using Flink’s Async IO operator:
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/
This way you’re not blocking your pipeline waiting for the HTTP call to return.