Embedding a Python interpreter in a multi-threaded C++ program with pybind11

Question:

I’m trying to use pybind11 in order to make a 3rd party C++ library call a Python method. The library is multithreaded, and each thread creates a Python object, and then does numerous calls to the object’s methods.

My problem is that the call to py::gil_scoped_acquire acquire; deadlocks. A minimal code which reproduces the problem is given below. What am I doing wrong?

// main.cpp
class Wrapper
{
public:
  Wrapper()
  {
    py::gil_scoped_acquire acquire;
    auto obj = py::module::import("main").attr("PythonClass")();
    _get_x = obj.attr("get_x");
    _set_x = obj.attr("set_x");
  }
  
  int get_x() 
  {
    py::gil_scoped_acquire acquire;
    return _get_x().cast<int>();
  }

  void set_x(int x)
  {
    py::gil_scoped_acquire acquire;
    _set_x(x);
  }

private:
  py::object _get_x;
  py::object _set_x;
};


void thread_func()
{
  Wrapper w;

  for (int i = 0; i < 10; i++)
  {
    w.set_x(i);
    std::cout << "thread: " << std::this_thread::get_id() << " w.get_x(): " << w.get_x() << std::endl;
    std::this_thread::sleep_for(100ms);    
  }
}

int main() {
  py::scoped_interpreter python;
  
  std::vector<std::thread> threads;

  for (int i = 0; i < 5; ++i)
    threads.push_back(std::thread(thread_func));

  for (auto& t : threads)
    t.join();

  return 0;
}

and the Python code:

// main.py
class PythonClass:
    def __init__(self):
        self._x = 0

    def get_x(self):
        return self._x

    def set_x(self, x):
        self._x = x

Related questions can be found here and here, but did not help me solve the problem.

Asked By: bavaza

||

Answers:

Python is known to have a Global Interpreter Lock.

So you basically need to write your own Python interpreter from scratch, or download the source code of Python and improve it a lot.

If you are on Linux, you could consider running many Python interpreters (using appropriate syscalls(2), with pipe(7) or unix(7) for interprocess communication) – perhaps one Python process communicating with each of your C++ threads.

What am I doing wrong?

Coding in Python something which should be coded otherwise. Did you consider trying SBCL?

Some libraries (e.g. Tensorflow) can be called from both Python and C++. Maybe you could take inspiration from them…

In practice, if you have just a dozen C++ threads on a powerful Linux machine, you could afford having one Python process per C++ thread. So each C++ thread would have its own companion Python process.

Otherwise, budget several years of work to improve the source code of Python to remove its GIL. You might code your GCC plugin to help you on that task -analyzing and understanding the C code of Python.

I managed to resolve the issue by releasing the GIL in the main thread, before starting the worker threads (added py::gil_scoped_release release;). For anybody who is interested, the following now works (also added cleaning up Python objects):

#include <pybind11/embed.h>  
#include <iostream>
#include <thread>
#include <chrono>
#include <sstream>

namespace py = pybind11;
using namespace std::chrono_literals;

class Wrapper
{
public:
  Wrapper()
  {
    py::gil_scoped_acquire acquire;
    _obj = py::module::import("main").attr("PythonClass")();
    _get_x = _obj.attr("get_x");
    _set_x = _obj.attr("set_x");

  }
  
  ~Wrapper()
  {
    _get_x.release();
    _set_x.release();
  }

  int get_x() 
  {
    py::gil_scoped_acquire acquire;
    return _get_x().cast<int>();
  }

  void set_x(int x)
  {
    py::gil_scoped_acquire acquire;
    _set_x(x);
  }

private:
  py::object _obj;
  py::object _get_x;
  py::object _set_x;
};


void thread_func(int iteration)
{
  Wrapper w;

  for (int i = 0; i < 10; i++)
  {
    w.set_x(i);
    std::stringstream msg;
    msg << "iteration: " << iteration << " thread: " << std::this_thread::get_id() << " w.get_x(): " << w.get_x() << std::endl;
    std::cout << msg.str();
    std::this_thread::sleep_for(100ms);    
  }
}

int main() {
  py::scoped_interpreter python;
  py::gil_scoped_release release; // add this to release the GIL

  std::vector<std::thread> threads;
  
  for (int i = 0; i < 5; ++i)
    threads.push_back(std::thread(thread_func, 1));

  for (auto& t : threads)
    t.join();

  return 0;
}
Answered By: bavaza