Load Machine Learning sklearn models (RandomForestClassifier) through java and send as argument to a function in python file
Question:
I have a ML model which is trained as saved as pickle file, Randomforestclassifier.pkl. I want to load this one time using java and then execute my “prediction” part code which is written python. So my workflow is like:
- Read Randomforestclassifier.pkl file (one time)
- Send this model as input to function defined in “python_file.py” which is executed from java for each request
- python_file.py has prediction code and predictions returned should be captured by java code
Please provide suggestions for this workflow requirement
I have used processbuilder in java to execute python_file.py and everything works fine except for model loading as one time activity
Answers:
You could use Jep.
I actually never tested the pickle module in Jep, but for your case it would be something like this:
try(Jep jep = new Jep()) {
// Load model
jep.eval("import pickle");
jep.eval("with open('Randomforestclassifier.pkl', 'rb'): as f: clf = pickle.load(f)");
Object randomForest = jep.getValue("clf");
...
// Then in another context you can pass your model to your function
jep.eval("import predictionModule");
jep.set("arg", randomForest);
jep.eval("result = predictionModule.use(arg)");
Object result = jep.getValue("result");
}
Assuming you have a module named predictionModule.py
which should be something like this:
import pickle
def use(model_as_bytes):
model = pickle.loads(model_as_bytes)
print(model)
# do other stuff
...
return prediction
Hope this helps.
Simple Solution without Addition Libraries##
I implemented a solution described in another posting. I was successful at implementing the solution described by Chandan. It is basically just calling the python files via a command line from your java application and getting the results back as a buffered reader.
https://stackoverflow.com/a/65211138/12576070
Adjustment I made for my ML application
My application involves passing a large amount of data into a trained machine learning model in Python. The features data was too big to send over the command line as an argument (like a csv formatted string) so I instead saved the data as a csv file and sent the file path as the argument into my python prediction function.
I have not compared the speed to the jep solution (or the jpy solution). Potentially they would be faster, but this solution does not need any additional libraries to be installed and it is fairly simple and straight forward. I have recently wrestled enough with trying, unsuccessfully, to get java ML libraries to work with my existing application that I like this simple approach. You will, of course, need to do your own trade between simplicity and performance. If my application grows enough with its ML implementation to justify looking at a more complex solution I may also revisit my simplicity/performance trade.
I have a ML model which is trained as saved as pickle file, Randomforestclassifier.pkl. I want to load this one time using java and then execute my “prediction” part code which is written python. So my workflow is like:
- Read Randomforestclassifier.pkl file (one time)
- Send this model as input to function defined in “python_file.py” which is executed from java for each request
- python_file.py has prediction code and predictions returned should be captured by java code
Please provide suggestions for this workflow requirement
I have used processbuilder in java to execute python_file.py and everything works fine except for model loading as one time activity
You could use Jep.
I actually never tested the pickle module in Jep, but for your case it would be something like this:
try(Jep jep = new Jep()) {
// Load model
jep.eval("import pickle");
jep.eval("with open('Randomforestclassifier.pkl', 'rb'): as f: clf = pickle.load(f)");
Object randomForest = jep.getValue("clf");
...
// Then in another context you can pass your model to your function
jep.eval("import predictionModule");
jep.set("arg", randomForest);
jep.eval("result = predictionModule.use(arg)");
Object result = jep.getValue("result");
}
Assuming you have a module named predictionModule.py
which should be something like this:
import pickle
def use(model_as_bytes):
model = pickle.loads(model_as_bytes)
print(model)
# do other stuff
...
return prediction
Hope this helps.
Simple Solution without Addition Libraries##
I implemented a solution described in another posting. I was successful at implementing the solution described by Chandan. It is basically just calling the python files via a command line from your java application and getting the results back as a buffered reader.
https://stackoverflow.com/a/65211138/12576070
Adjustment I made for my ML application
My application involves passing a large amount of data into a trained machine learning model in Python. The features data was too big to send over the command line as an argument (like a csv formatted string) so I instead saved the data as a csv file and sent the file path as the argument into my python prediction function.
I have not compared the speed to the jep solution (or the jpy solution). Potentially they would be faster, but this solution does not need any additional libraries to be installed and it is fairly simple and straight forward. I have recently wrestled enough with trying, unsuccessfully, to get java ML libraries to work with my existing application that I like this simple approach. You will, of course, need to do your own trade between simplicity and performance. If my application grows enough with its ML implementation to justify looking at a more complex solution I may also revisit my simplicity/performance trade.