Load Machine Learning sklearn models (RandomForestClassifier) through java and send as argument to a function in python file

Question

I have a ML model which is trained as saved as pickle file, Randomforestclassifier.pkl. I want to load this one time using java and then execute my “prediction” part code which is written python. So my workflow is like:

Read Randomforestclassifier.pkl file (one time)
Send this model as input to function defined in “python_file.py” which is executed from java for each request
python_file.py has prediction code and predictions returned should be captured by java code

Please provide suggestions for this workflow requirement
I have used processbuilder in java to execute python_file.py and everything works fine except for model loading as one time activity

Asked By: My3

||

Source

Answer 1

You could use Jep.

I actually never tested the pickle module in Jep, but for your case it would be something like this:

try(Jep jep = new Jep()) {
    // Load model
    jep.eval("import pickle");
    jep.eval("with open('Randomforestclassifier.pkl', 'rb'): as f: clf = pickle.load(f)");
    Object randomForest = jep.getValue("clf");

    ...

    // Then in another context you can pass your model to your function
    jep.eval("import predictionModule");
    jep.set("arg", randomForest);
    jep.eval("result = predictionModule.use(arg)");
    Object result = jep.getValue("result");
}

Assuming you have a module named predictionModule.py which should be something like this:

import pickle

def use(model_as_bytes):
    model = pickle.loads(model_as_bytes)
    print(model)
    # do other stuff
    ...
    return prediction

Hope this helps.

Answered By: btt

Answer 2

Simple Solution without Addition Libraries##

I implemented a solution described in another posting. I was successful at implementing the solution described by Chandan. It is basically just calling the python files via a command line from your java application and getting the results back as a buffered reader.

https://stackoverflow.com/a/65211138/12576070

Adjustment I made for my ML application

My application involves passing a large amount of data into a trained machine learning model in Python. The features data was too big to send over the command line as an argument (like a csv formatted string) so I instead saved the data as a csv file and sent the file path as the argument into my python prediction function.

I have not compared the speed to the jep solution (or the jpy solution). Potentially they would be faster, but this solution does not need any additional libraries to be installed and it is fairly simple and straight forward. I have recently wrestled enough with trying, unsuccessfully, to get java ML libraries to work with my existing application that I like this simple approach. You will, of course, need to do your own trade between simplicity and performance. If my application grows enough with its ML implementation to justify looking at a more complex solution I may also revisit my simplicity/performance trade.

Answered By: MustardMan

Load Machine Learning sklearn models (RandomForestClassifier) through java and send as argument to a function in python file

Question:

Answers:

Simple Solution without Addition Libraries##

Adjustment I made for my ML application