OpenMP / Pybind11: Accessing Python object in for loop returns interned string error

Question:

I am trying to use OpenMP on a list of Python objects by using Pybind11 in C++. I transform this list in an std::vector of Python objects (as explained in this post) and then try to access them in a parallelized for loop. However, when invoking the attributes of any python object in the vector in the for loop, I get the error:

Fatal Python error: deletion of interned string failed
Thread 0x00007fd282bc7700 (most recent call first):
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

My questions are: What is the deletion of interned string error ? and how to avoid it in OpenMP ?

I have read here that the problem is with respect to the copy of the string, so I tried to refer to the string with a pointer but it didn’t help. Also, the problem doesn’t come from a conversion problem in Pybind, because if I remove the #pragma omp clause, the code works perfectly.

C++ Code

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <pybind11/stl.h>
#include <omp.h>
#include <chrono>
#include <thread>

namespace py = pybind11;

py::object create_seq(
  py::object self
  ){

  std::vector<py::object> dict = self.cast<std::vector<py::object>>();

  #pragma omp parallel for
  for(unsigned int i=0; i<dict.size(); i++) {
    dict[i].attr("attribute") = 2;
  }

  return self;
}

PYBIND11_MODULE(error, m){

    m.doc() = "pybind11 module for iterating over generations";

    m.def("create_seq", &create_seq,
      "the function which creates a sequence");

}

Python Code

import error

class test():
    def __init__(self):
        self.attribute = None

if __name__ == '__main__':
    dict = {}
    for i in range(50):
        dict[i] = test()
    pop = error.create_seq(list(dict.values()))

Compiled with:

g++ -O3 -Wall -shared -std=c++14 -fopenmp -fPIC `python3 -m pybind11 --includes` openmp.cpp -o error.so
Asked By: Joachim

||

Answers:

You can not reliably call any Python C-API code (which underlies pybind11), without holding the Global Interpreter Lock (GIL). Handing the GIL in your OpenMP loop for each access on each thread will effectively serialize the loop, but now with added overhead, so it will be slower than running it serially in the first place.

As for interned strings: the Python interpreter saves common immutable objects such as certain strings and small integers to prevent them from being created over and over again. Such common strings are said to be “interned”, and this typically happens under the hood (although you can add your own using PyString_InternFromString/PyUnicode_InternFromString). Since these are singleton objects by design (that’s their purpose, after all), only one thread should create/delete them.

Answered By: Wim Lavrijsen

I was able to find a solution, but I think I am just doing a single threaded work with multiple threads. I used a #pragma omp ordered in the following way:

std::vector<py::object> dict = self.cast<std::vector<py::object>>();
  #pragma omp parallel for ordered schedule(dynamic)
  for(unsigned int i=0; i<dict.size(); i++) {
    py::object genome = dict[i];
    std::cout << i << std::endl;
    #pragma omp ordered
    genome.attr("fitness")=2; 
    }

And this works

EDIT

I controlled the execution time with and without parallelization and it’s the same

Answered By: Joachim
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.