Load Machine Learning sklearn models (RandomForestClassifier) through java and send as argument to a function in python file

Question:

I have a ML model which is trained as saved as pickle file, Randomforestclassifier.pkl. I want to load this one time using java and then execute my “prediction” part code which is written python. So my workflow is like:

  1. Read Randomforestclassifier.pkl file (one time)
  2. Send this model as input to function defined in “python_file.py” which is executed from java for each request
  3. python_file.py has prediction code and predictions returned should be captured by java code

Please provide suggestions for this workflow requirement
I have used processbuilder in java to execute python_file.py and everything works fine except for model loading as one time activity

Asked By: My3

||

Answers:

You could use Jep.

I actually never tested the pickle module in Jep, but for your case it would be something like this:

try(Jep jep = new Jep()) {
    // Load model
    jep.eval("import pickle");
    jep.eval("with open('Randomforestclassifier.pkl', 'rb'): as f: clf = pickle.load(f)");
    Object randomForest = jep.getValue("clf");

    ...

    // Then in another context you can pass your model to your function
    jep.eval("import predictionModule");
    jep.set("arg", randomForest);
    jep.eval("result = predictionModule.use(arg)");
    Object result = jep.getValue("result");
}

Assuming you have a module named predictionModule.py which should be something like this:

import pickle

def use(model_as_bytes):
    model = pickle.loads(model_as_bytes)
    print(model)
    # do other stuff
    ...
    return prediction

Hope this helps.

Answered By: btt

Simple Solution without Addition Libraries##

I implemented a solution described in another posting. I was successful at implementing the solution described by Chandan. It is basically just calling the python files via a command line from your java application and getting the results back as a buffered reader.

https://stackoverflow.com/a/65211138/12576070

Adjustment I made for my ML application

My application involves passing a large amount of data into a trained machine learning model in Python. The features data was too big to send over the command line as an argument (like a csv formatted string) so I instead saved the data as a csv file and sent the file path as the argument into my python prediction function.

I have not compared the speed to the jep solution (or the jpy solution). Potentially they would be faster, but this solution does not need any additional libraries to be installed and it is fairly simple and straight forward. I have recently wrestled enough with trying, unsuccessfully, to get java ML libraries to work with my existing application that I like this simple approach. You will, of course, need to do your own trade between simplicity and performance. If my application grows enough with its ML implementation to justify looking at a more complex solution I may also revisit my simplicity/performance trade.

Answered By: MustardMan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.