Running Python code in parallel from Rust with rust-cpython

Question:

I’m trying to speed up a data pipeline using Rust. The pipeline contains bits of Python code that I don’t want to modify, so I’m trying to run them as-is from Rust using rust-cpython and multiple threads.
However, the performance is not what I expected, it’s actually the same as running the python code bits sequentially in a single thread.

Reading the documentation, I understand when invoking the following, you actually get a pointer to a single Python interpreter that can only be created once, even if you run it from multiple threads separately.

    let gil = Python::acquire_gil();
    let py = gil.python();

If that’s the case, it means the Python GIL is actually preventing all parallel execution in Rust as well. Is there a way to solve this problem?

Here’s the code of my test:

use cpython::Python;
use std::thread;
use std::sync::mpsc;
use std::time::Instant;

#[test]
fn python_test_parallel() {
    let start = Instant::now();

    let (tx_output, rx_output) = mpsc::channel();
    let tx_output_1 = mpsc::Sender::clone(&tx_output);
    thread::spawn(move || {
        let gil = Python::acquire_gil();
        let py = gil.python();
        let start_thread = Instant::now();
        py.run("j=0nfor i in range(10000000): j=j+i;", None, None).unwrap();
        println!("{:27} : {:6.1} ms", "Run time thread 1, parallel", (Instant::now() - start_thread).as_secs_f64() * 1000f64);
        tx_output_1.send(()).unwrap();
    });

    let tx_output_2 = mpsc::Sender::clone(&tx_output);
    thread::spawn(move || {
        let gil = Python::acquire_gil();
        let py = gil.python();
        let start_thread = Instant::now();
        py.run("j=0nfor i in range(10000000): j=j+i;", None, None).unwrap();
        println!("{:27} : {:6.1} ms", "Run time thread 2, parallel", (Instant::now() - start_thread).as_secs_f64() * 1000f64);
        tx_output_2.send(()).unwrap();
    });

    // Receivers to ensure all threads run
    let _output_1 = rx_output.recv().unwrap();
    let _output_2 = rx_output.recv().unwrap();
    println!("{:37} : {:6.1} ms", "Total time, parallel", (Instant::now() - start).as_secs_f64() * 1000f64);
}
Asked By: Amaury

||

Answers:

The CPython implementation of Python does not allow executing Python bytecode in multiple threads at the same time. As you note yourself, the global interpreter lock (GIL) prevents this.

We don’t have any information on what exactly your Python code is doing, so I’ll give a few general hints how you could improve the performance of your code.

  • If your code is I/O-bound, e.g. reading from the network, you will generally get nice performance improvements from using multiple threads. Blocking I/O calls will release the GIL before blocking, so other threads can execute during that time.

  • Some libraries, e.g. NumPy, internally release the GIL during long-running library calls that don’t need access to Python data structures. With these libraries, you can get performance improvements for multi-threaded, CPU-bound code even if you only write pure Python code using the library.

  • If your code is CPU-bound and spends most of its time executing Python bytecode, you can often use multipe processes rather than threads to achieve parallel execution. The multiprocessing in the Python standard library helps with this.

  • If your code is CPU-bound, spends most of its time executing Python bytecode and can’t be run in parallel processes because it accesses shared data, you can’t run it in multiple threads in parallel – the GIL prevents this. However, even without the GIL, you can’t just run sequential code in parallel without changes in any language. Since you have concurrent access to some data, you need to add locking and possibly make algorithmic changes to prevent data races; the details of how to do this depend on your use case. (And if you don’t have concurrent data access, you should use processes instead of threads – see above.)

Beyond parallelism, a good way to speed up Python code with Rust is to profile your Python code, find the hot spots where most of the time is spent, and rewrite these bits as Rust functions that you call from your Python code. If this doesn’t give you enough of a speedup, you can combine this approach with parallelism – preventing data races is generally easier to achieve in Rust than in most other languages.

Answered By: Sven Marnach

If you use py03 bindings you can use the allow_threads method and callbacks to free the GIL for faster parralelism: https://pyo3.rs/v0.13.2/parallelism.html

Answered By: Happy Machine
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.