Concurrency with embedded Python in a multi-threaded C++ application

Note: this blog post focuses on the use of Python 2.7 and boost python

Why embed Python?

Embedding Python allows us to run scripts from within our C++ program, which can be useful in a number of ways:

It enables us to run custom code without having to recompile the main program
Python may be more flexible/suitable for solving a specific problem or set of problems
We may want to expose a scripting API that drives our program

In our case, all three are relevant. We're exposing an API that drives our program, so that we can solve specific problems using custom Python code.

Embedding Python

Embedding and running Python is pretty straightforward, especially if you're using boost python. Unless of course you're dealing with multiple threads.

Which again is fairly straightforward:

PyGILState_STATE gstate;
gstate = PyGILState_Ensure();

// Perform Python actions here.
result = CallSomeFunction();

// Release the thread. No Python API allowed beyond this point.
PyGILState_Release(gstate);

However, although our Python scripts may be getting called from multiple threads, each thread will block at PyGILState_Ensure() until the current thread has released the Python Global Interpreter Lock; each of our threads will execute their respective Python scripts sequentially. So how do we get some form of concurrency into our program?

Introducing concurrency

One solution might be to write a delegate function for each C++ function (where appropriate) that we might want to release the GIL for:

void delegateFunction()
{
  // release GIL here
  
  realFunction();
  
  // block and wait to acquire GIL here before returning to Python execution
}

The observant reader will notice here that we'll need access to the gstate variable from the previous example in order for this solution to work. A cleaner way of acquiring/releasing the GIL might utilise PyEval_SaveThread() and PyEval_RestoreThread():

void delegateFunction()
{
  // release GIL here
  PyThreadState* state = PyEval_SaveThread();
  
  realFunction();
  
  // block and wait to acquire GIL here before returning to Python execution
  PyEval_RestoreThread(state);
}

nb: Resource Acquisition Is Initialisation is useful to use here, and can be a cleaner solution when dealing with exceptions.

That said, however, I have found that this can lead to intermittent deadlocking. It seems that each PyThreadState used by PyGILState_Ensure() is not unique, leading to multiple threads attempting to restore the same PyThreadState, resulting in a deadlock.

This is simple enough to overcome by guaranteeing that each thread has its own PyThreadState. A partial solution is as follows:

// Once in each thread
m_state = PyThreadState_New(m_interpreterState);
PyEval_RestoreThread(m_state);

// Perform some Python actions here

// Release Python GIL
PyEval_SaveThread();

Where the PyInterpreterState is acquired after calling Py_Initialize:

// Initialise the Python interpreter
Py_Initialize();

// Create GIL/enable threads
PyEval_InitThreads();

// Get the default thread state  
PyThreadState* state = PyThreadState_Get();
PyInterpreterState* interpreterState = state->interp;

// Store interpreter state and use when creating new PyThreadStates

So, now we can guarantee that each thread has its own PyThreadState, and using delegate functions we can release/acquire the GIL where required to achieve some sort of concurrency. Great!

However, sometimes creating a delegate function for each C++ API function isn't feasible. This is where boost python's call policies can be really useful.

Boost Python call policies

Every time Python calls a C++ function using a binding made with boost python, it uses a call policy. The call policy concept makes use of a precall() and postcall() function, which are called before and after your C++ function respectively.

Therefore, in the interest of eliminating repeat code, we can create our own call policy:

namespace boost { namespace python {
  
struct release_gil_policy
{
  // Ownership of this argument tuple will ultimately be adopted by
  // the caller.
  template <class ArgumentPackage>
  static bool precall(ArgumentPackage const&)
  {
    // Release GIL and save PyThreadState for this thread here

    return true;
  }

  // Pass the result through
  template <class ArgumentPackage>
  static PyObject* postcall(ArgumentPackage const&, PyObject* result)
  {
    // Reacquire GIL using PyThreadState for this thread here

    return result;
  }

  typedef default_result_converter result_converter;
  typedef PyObject* argument_package;

  template <class Sig> 
  struct extract_return_type : mpl::front<Sig>
  {
  };

private:
  // Retain pointer to PyThreadState on a per-thread basis here

};
}
}

nb: Call policies are static, therefore if we save PyThreadState as a private variable in one thread, it'll get overwritten in another. A map could be used instead.

This way, we can specify a call policy with our boost python C++ binding:

def("myFunction", make_function(&myFunction, release_gil_policy()));

Outcome

Now that we have a custom call policy that will release and reacquire the GIL for us, what exactly is happening?

Each time a C++ API function that uses release_gil_policy() is called, precall() will release the Python GIL. This allows another thread to acquire the GIL and start executing Python code. When the second thread calls a release_gil_policy() enabled C++ API function, it too will release the GIL during the precall() function. When their respective C++ API function calls are completed, postcall() will attempt to reacquire the GIL for that thread. Once the GIL is reacquired, the thread can then continue executing Python code, while the other thread waits until it can obtain the GIL.

The Python interpreter will still only execute one line of code at a time, however. But when that Python code is making a C++ function call, we can let another thread use the interpreter instead of being greedy with it.

A word of caution: although it might seem like a good idea to use release_gil_policy() on every function call, it might not be wise to do so. Releasing the GIL and reacquiring it rapidly may incur more overhead than benefit with shorter C++ function calls.

Using delegate functions or a custom call policy the Python GIL can be easily released/reacquired on a per C++ API function basis, enabling as much concurrency with our embedded Python scripts as required.

If you have any questions, comments or additions please post them below.

-Sam

Concurrency with embedded Python in a multi-threaded C++ application

Samuel Jones

CEO & Lead software consultant

Why embed Python?

Embedding Python

Introducing concurrency

Boost Python call policies

Outcome

Samuel Jones

CEO & Lead software consultant

Subscribe to our newsletter

What did you think? Rate this article below.

Comments

Leave a comment

Share this article

Let’s talk about how our software can work for you.