pybind11 module import or link .so dependencies macOS

Question:

TLDR: How do I link a .so/import a dependency when importing my pybind11 module in python?

I am attempting to build a pybind11 module that, in parts, depends on the C++ part of a different python library. On Linux, I can just link that library in CMake using target_link_libraries — which does not work for .so libraries on macOS (can't link with bundle (MH_BUNDLE) only dylibs (MH_DYLIB) file).

When importing the pybind11-generated module without linking in Python on macOS, I get an ImportError: dlopen(/path/to/my_module.cpython-38-darwin.so, 0x0002): symbol not found in flat namespace (__<mangled symbol that is part of the library my module depends on>). This can be prevented by importing the dependency itself in Python before importing my own module.

Is there a way to either link that library, or to ensure that Python imports the dependency before loading my binary when running import my_module?

I attempted putting the shared library file in a folder with an __init__.py that just first imports the dependency, and then * from the .so — but that resulted in some imports not working any longer (e.g., import my_module.my_submodule fails).

EDIT: A working, although cumbersome, drop-in solution is to add a dummy module to the pipeline. I.e., rename the original my_module to _my_module, and create a dummy my_module that does nothing besides importing the dependency:

#include <Python.h>

PyMODINIT_FUNC
PyInit_my_module(void)
{
    PyImport_ImportModule("the_dependency");
    return PyImport_ImportModule("_my_module");
}
Asked By: Jofkos

||

Answers:

This is not an ideal solution, but seemingly the best way to solve the import-before-binary problem, while also retaining the ability to use the imported module just in the same way as one would in normal cases. This is achieved by using a dummy module to import the python dependency (which contains the associated C++ dependency as .so), before importing the original module.

So here’s how it is done, assuming CMake is used to compile the project.

  1. Conditionally set the module name to _my_module instead of my_module if it is compiled for macOS:
if (APPLE)
    set(MAIN_LIB_NAME _my_module)
else()
    set(MAIN_LIB_NAME my_module)
endif()
    
pybind11_add_module(${MAIN_LIB_NAME}
                    src/source1.cpp
                    # your source files, as before
)
  1. Add a dummy module that takes the original name, this one is then used to import the dependency and load the actual module
if (APPLE)
    pybind11_add_module(my_module macos_dummy.h macos_dummy.cpp)
elseif (UNIX)
    # in my case, on linux I just linked against the .so
    target_link_libraries(my_module PUBLIC my_dependency)
endif()
  1. Define a PYBIND11_MODULE in your original module that takes the dummy name, so that it can be properly imported by Python later on (i.e., let Pybind declare the PyInit_ function). Do this while keeping your original PYBIND11_MODULE (with the original name):
#ifdef __APPLE__ // If apple, a dummy module is added, so that the dependency can be imported before loading the actual binary
PYBIND11_MODULE(_my_module, m) {
    m.doc() = "dummy module; doesn't do anything; if you see this instead of the actual module, something went wrong.";
}
#endif

PYBIND11_MODULE(my_module, m) {  // the original module, left unchanged
// ...
  1. Implement the actual dummy module, that is using Python’s import mechanics to import the dependency, find the original module and pretend to have been that original module all along:
#include <dlfcn.h>
#include <macos_dummy.h>

typedef PyObject* (*PyInitFunc)(void);

PyMODINIT_FUNC PyInit_my_module(void)
{
    PyImport_ImportModule("my_dependency");  // import the dependency, this is the entire reason this exists in the first place
    PyObject* obj = PyImport_ImportModule("_my_module");  // let python find the correct binary
    const char* actual_module_path = PyUnicode_AsUTF8(PyObject_GetAttrString(obj, "__file__")); // get the path of the binary found by python

    void* actual_module = dlopen(actual_module_path, RTLD_LAZY | RTLD_GLOBAL);  // access the binary
    if (!actual_module) {
        printf("Module %s not foundn", actual_module_path);
        return NULL;
    } else {
        PyInitFunc actual_pyinit = dlsym(actual_module, "PyInit_my_module");  // retrieve the actual module
        return actual_pyinit();
    }
}

and the associated header:

#ifndef MY_MODULE_MACOS_DUMMY_H
#define MY_MODULE_MACOS_DUMMY_H
#include <Python.h>

__attribute__((visibility("default"))) PyMODINIT_FUNC PyInit_my_module(void);

#endif //MY_MODULE_MACOS_DUMMY_H

That’s it. From now on, given that both generated .so files are in the path, importing the module under the original name will now import the dependency too.

Answered By: Jofkos