Author Archives: Dumitrita Munteanu

Multi-threading in the C++11 Standard

Thirteen years after publishing the first C++ standard, just when the new C++11 (or C++0x) standard was being introduced, the members of the C++ Standards Committee decided to make a significant change to multi-thread programming. The C++ language offers for the first time support in implementing applications that require concurrent programming, regardless of the development platform. Before the C++11 standard the multi-threaded applications were based on platform-specific extensions, like Intel TBB, OpenMP, Pthreads, etc. Portable applications are the major advantage brought by this new feature (i.e. Windows multi-threaded applications are easily ported to iPhone or Android platforms). An advantage for those familiariazed with the Boost thread library is that many concepts from the standard C++11 library keep the same name and structure as the boost threads library classes.
The C++ Standard Library includes classes for thread manipulation and synchronization, common protected data and low-level atomic operations. We will next exemplify and provide a general description of how these concepts occur in the C++11 Standard.

Starting a thread

Starting a thread in C++11 is as simple as declaring and instantiating a new object. We will analyze a simple multithreaded application to demonstrate how we can use the threads from the C++ standard library. For example:


#include <iostream>
#include <thread> //-- (1)

void execute()
{
std::cout << ”Hello Concurent World” << std::endl;
}

int main()
{
std::thread worker_thread(execute); //– (2)

worker_thread.join(); //–(3)
}

The first difference is the inclusion of the header file #include<thread> (1).  The header file contains functions and classes for thread management. Threads are started by defining an object std::thread, that specifies in its constructor an initial method that will be executed by the thread, in our case the execute() method, which is the place where the new thread will start its execution (2). After the thread was launched and before the std::thread object is destroyed by exiting the main function, it must be explicitly specified if the main thread is supposed to wait until the secondary thread completes execution (by calling the join() (3) function) or if the secondary thread will run independently of the life cycle of the main thread (by calling the detached() method). In this case the thread will run in the background, with no mean to comunicate with it. Detached threads, also known as daemon threads, are named after a Unix OS concept  (when a daemon process runs in the background, without a specific interface).

Passing parameters to a thread function

Passing parameters to a new thread in C++11 is as simple as passing the parameters to a callable object. It is important to note that the parameters are passed as local copies in the thread’s stack, whence they can be accessed by the running thread, even if the corresponding parameters of the function expect reference. For example:


void execute(const std::string& filename);

std::thread worker_thread(execute, ”input.dat”);

In this case a new thread is created, associated with worker_thread variable, which calls the execute(“input.dat”) function. Pay attention to the passed parameter. Even if the execute function receives a reference of std::string as a parameter, the string is passed as const char* and converted only in the new thread to std::string. The manner in which passing parameters to a std::thread  function works, can lead to two possible problems.

The first case would be the one in which the transmitted parameter is a pointer to a local variable. For example:


void execute(const std::string& filename);

void oops(const char* parameter)
{
char buffer[50];
sprintf(buffer, “%s”, parameter);
std::thread worker_thread(execute, buffer); //– (1)
worker_thread.detach();
}

In this example, a pointer to the local variable buffer is passed to the new thread, that waits for a std::string type parameter. It is possible that the oops function completes before its conversion from char* to std::string takes place. The passed pointer becomes a dangling pointer (1) and the thread behaviour will be undefined.

The solution would be the explicit conversion of the buffer variable to a std::string, before the parameter is being passed to the constructor of the new thread:

std::thread worker_thread(do_work, std::string(buffer));

The second case would be where the passed parameter is copied, even if the intention was to send a reference of the object whose value had to be changed by the thread. For example:

void execute(std::string& str)  //-- (1)
{
str.assign(“Hallo Welt!”);
}
void oops(const char* parameter)
{
std::string greeting(“Hello World!”);
std::thread worker_thread(execute, greeting); //--  (2)
worker_thread.join();
std::cout<< greeting << std::endl;            //--  (3)
}

Even if the execute function (1) is waiting for a reference as parameter, the constructor of the new thread doesn’t know this, which is why it just copies the my_string variable internally. When the thread calls the execute function (2), it will send as parameter a reference to the internal copy of the greeting variable. Therefore, when the new thread completes its execution, the my_string variable (with the new value “Hallo Welt”) will be destroyed together with the destruction of the internal copies of the parameters passed to the constructor of the thread. For this reason, the value of the greeting variable remains unchanged, showing the initial value “Hello World” (3). In this case the solution would be to transmit the parameters that must indeed be a reference, using the std :: ref function.

std::thread worker_thread(execute, std::ref(greeting));

The std::ref function is available, too, only with C++11 standard and is a method used to simulate the reference feature so that, in the end, greeting variable (3) will have the changed value “Hallo Welt!”.

For those familiar with the std::bind function, the semantics of passing a pointer to a member function of an object as parameter to the constructor of a std::thread may be surprising. For example:
class Test
{
public:
void execute();
};

Test custom_test;
std::thread worker_thread(&Test::execute, &custom_test);  //–  (1)

This code will launch a thread which will run the custom_test.execute() method. If the execute() method receives parameters, they will be transmitted in the same way, as the third, fourth, etc. for the constructor of the current thread.

Transferring thread possession

Suppose you want a std::thread  create_thread() function to create a thread that runs in the background, but which, instead of waiting for the new thread to complete its execution, returns the new thread to the function from which it was called. Or we can assume that the function creates a thread and send its ownership to some function that has to wait for the newly created thread to complete its execution. In both cases it is necessary to transfer the possession of std::thread which, similar to a std::ifstream and a std::unique_ptr can be transferred, but not copied. For example:

void some_function(int n);

std::thread create_thread()
{
std::thread my_thread(some_function,  24);
return my_thread;
}

std::thread first_thread (some_function, 25); //–  (1)
std::thread second_thread = std::move(t1); //–  (2)
first_thread = create_thread();   //–  (3)

Firstly, a new thread is launched (1) and it is associated with the first_thread variable. The possession of the new thread is then transferred towards the second_thread variable (2). The next possession transfer (3) doesn’t require calling the std::move function because the current owner is a temporary object and the transfer is automatic and implicit.

Synchronization mechanisms

For synchronization between threads, the C++11 standard provides classical synchronization  mechanisms like mutex objects (std :: mutex, std :: recursive_mutex, etc.), condition variables (std::condition_variable, std::condition_variable_any), which can be accessed through RAII locks (resource acquisition is initialization, std::lock_quard andstd::unique_lock) and other mechanisms called futures and promises used to transfer results between threads or std::package_task in order to “pack” a call to a function that can generate such a result.

Mutex

We saw in the previous examples how parameters can be launched and passed to a thread function. This mechanism, however, is not enough when one wants certain resources to be used (modified) by multiple threads running simultaneously. This situation requires the use of a mutual exclusion mechanism that ensures data integrity when the possibility of multiple threads modifying the same resource at the same time exists.

The most commonly used synchronization primitive is the mutex. Prior to accessing resources that can be simultaneously modified by different threads, a thread must lock the mutex associated with the shared resource and when the thread is no longer operating on shared data, it must be unlocked. If a mutex is already locked and another thread tries to lock it again, it must wait until the thread that has successfully locked the mutex unlocks it.

The C+11 standard provides std::mutex primitive, which can be used by including the header #include<mutex>. A std::mutex object also provides member functions  – lock() andunlock( ) -  to explicitly lock or unlock a mutex. The most common use of a mutex is when one wants to protect a particular block of code. To this end the C++ standard library provides the std::lock_guard< > template, whose mechanism is based on the RAII principle (resource acquisition is initialization). The mutex is blocked in the std::lock_quard object’s constructor and automatically unlocked in the object’s destructor. For example:

std::mutex my_mutex;
unsigned int counter = 0;

unsigned int increment()
{
std::lock_quard<std::mutex> lock_counter(my_mutex);
return ++counter;
}
unsigned int query()
{
std::lock_quard<std::mutex> lock_counter(m);
return counter;
}

In this example, the access to the counter variable is serialized. If more than one thread calls the query() method concurrently, they will be blocked until the thread that has successfully locked the mutex will unlock it. Since both functions lock the same mutex, if a thread calls the query() method and another calls the increment( ) at the same time, then only one of them will be able to lock the mutex and to access the counter variable.

When questioning exception handling, a std::lock_quard< > type variable brings additional benefits compared to the manual unlock by directly calling the methods lock() and unlock( ) on a mutex. When using the manual lock, one must make sure that the mutex is unlocked at each exit from the protected region, including the regions in which it prematurely terminates execution, due to the launch of an exception. This safe behaviour is assured by using a std::lock_quard object, because by launching an exception the destructor of the std::lock_quard object will be automatically called due to the stack unwind mechanism.

Condition Variables

In the previous examples we discussed ways to protect the data shared between multiple threads. When protecting the shared data is not enough one can also synchronize the operations executed by different threads. As a rule, one wants a thread to wait until an event occurs or until a condition becomes true. To this end, C + + Standard Library provides primitives such as condition variables and futures.

In the C++ 11 Standard, condition variables have not one but two implementations: std::condition_variable and std::condition_variable_any. Both implementations can be used if the <condition_variable> header is included. To facilitate the communication between threads, condition variables are usually associated with a mutex, for std::condition_variable or any other mechanism that provides mutual exclusion, for std::condition_variable_any.
The thread waiting for a conditional variable to become true should firstly lock a mutex using std::unique_lock primitive, the necessity of which we shall see later. The mutex is atomically unlocked when the thread starts to wait for the condition variable to become true. When a notification is received relative to the condition variable the thread is waiting for, the thread is restarted and blocks the mutex again. A practical example may be a buffer that is used to transmit data between two threads:

std::mutex mutex;
std::queue<buffer_data> buffer;
std::condition_variable buffer_cond;

void data_preparation_thread()
{
while(has_data_to_prepare())                //–  (1)
{
buffer_data data = prepare_data();
std::lock_quard<std::mutex> lock(mutex);  //–  (2)
buffer.push(data);
buffer_cond.notify_one();                 //– (3)
}
}

void data_processing_thread()
{
while(true)
{
std::unique_lock<std::mutex> lock(mutex);              //– (4)
buffer_cond.wait(lock, []{return ! buffer.empty()})    //– (5)
buffer_data data = buffer.front();
buffer.pop();
lock.unlock();                                         //– (6)
process(data);
if(is_last_data_entry(data))
break;
}
}

When the data is ready for processing (1) the thread preparing the data locks the mutex (2) to protect the buffer when it adds the new values. Then it calls the notify_one ( ) method on the buffer_cond condition variable (3) to notify the thread waiting for data (if any) that the buffer contains data to be processed. The thread that processes the data from the buffer firstly locks the mutex, but this time using a std::unique_lock (4). The thread then calls the wait ( ) method on the buff_cond variable condition, sending to it as parameters the lock object and a lambda function that is the condition for which the thread waits. Lambda functions are another specific feature of the C++11 standard, enabling anonymous functions to be part of other expressions. In this case the lambda function []{return ! buffer.empty()} is written inline in the source code and verifies if there is data that can be processed in the buffer. The wait ( ) method then checks if the condition is true (by calling the lambda function that was passed) and returns the result. If the condition is not met (the lambda function returns false), then the wait function unlocks the mutex and puts the thread on lock or standby.

When the condition variable is notified by calling the notify_one ( ) function of from data_preparation_thread ( ), the thread processing the data is unlocked, it locks again the mutex and checks again the condition leaving the method wait ( ) with the mutex still locked if the condition is met. If the condition is not met, the thread unlocks the mutex and waits again. This is why one uses std::unique_lock because the thread that processes the data must unlock the mutex while waiting and then it must lock it again. In this case std::lock_guard doesn’t  provide this flexibility. If the mutex would remain locked while the thread waiting for data to be processed is blocked, then the thread that prepares the data could not lock the mutex to insert the new values into the buffer, and the thread that processes the data would never have the condition met. Flexibility to unlock an std::unique_lock object is not only used in calling the wait ( ) method, but also called upon when the data is ready for processing but before being processed (6). This happens because the buffer is only used to transfer data from one thread to another and in this case one should not lock the mutex during data processing, because it could be a time consuming operation.

Futures

Another synchronization mechanism is a future, i.e. an asynchronous return object (an object that reads the result of a condition/setting common to many threads) implemented in the C++11 Standard Library through two template classes declared in the header < futures >:unique futures (std::future < >) and in shared futures (std::shared_future < >) , both modeled after std::unique_ptr and std::shared_ptr mechanisms. For example, let us suppose we have an operation that performs a very time consuming calculation and the result of the operation is not necessary immediate. In this case we can start a new thread to perform the operation in the background, which implies that we need the result to be transferred back to the method in which the thread was released, because the object std::thread does not include a mechanism for this situation. Here comes the template function std::async, also included in the <future> header.

An std::async object is used to launch an asynchronous operation whose result is not immediately necessary. Instead of waiting for a std::thread object to complete its execution by providing the result of the operation, the std::async function returns a std::future that can encapsulate the operation result. When the result is necessary, one can call the get ( ) method on the std::future ( ) object and the thread is blocked until the future object is ready, meaning it can provide the result of the operation. For example:

#include <future>
#include <iostream>

int  long_time_computation();
void do_other_stuff();

int main()
{
std::future<int> the_result = std::async(long_time_computation);

do_other_stuff();

std::cout << “The result is ” << the_result.get() << std::endl;
}

A std::async object is a high-level utility which provides an asynchronous result and which deals internally with creating an asynchronous provider and prepares the common data when the operation ends. This can be emulated by a std::package_task object (or std::bind and std::promise) and by a std::thread, but using a std::async object is safer and easier.

Packages

A std::package object connects a function and a callable objects. When the std::package <> object is called, this calls in turn the associated function or the callable object and prepares the future object in ready state, with the value returned by the performed operation as an associated value. This mechanism can be used for example when each operation has to be executed by a separate thread or sequentially run on a thread in the background. If a large operation can be divided into several sub-operations, each of these can be mapped into a std::package_task <>instance, which will be returned to the operations manager. Thus the details of the operation are being abstracted and the manager operates only with std::package_task <> instances of individual functions. For example:

#include <future>
#include <iostream>
int execute(int x, int y) { return std::pow(x,y); }

void main()
{
std::packaged_task<int()> task(std::bind(execute, 2, 10));
std::future<int> result = task.get_future();     //– (1)

task();  //– (2)

std::cout << “task_bind:t” << result.get() << ‘n’; //– (4)
}

When the std::packaged_task object is called (2) the execute function associated with it is called by default, to which parameters 2 and 10 will be passed and the result of the operation will be asynchronously saved in the std::future object (1). Thus, it is possible to encapsulate an operation in a std::package_task and get the object std::future which contains the result of the operation before the std::package_task object is called. When the result of the operation is required, it can be gotten when the std::future object is in the ready state (3).

Promises

As we could see in the Futures section, sending data between threads can be done by sending them as parameters to the function of the thread and the result can be obtained by returning arguments by reference, using the async() method. Another transmission mechanism of the data resulting from the operations performed by different threads is to use a std::promise/std::future. A std::promise <T> object provides a mechanism to set a type T value, which then can be read by a std::future <T> object. While a std::future object allows accessing the result data (using get () method), the promise object is responsible for providing the data (using one of the set_ … () methods). For example:

#include <future>
#include <iostream>

void execute(std::promise<std::string>& promise)
{
std::string str(“processed data”);
promise.set_value(std::move(str));    //– (3)
}

void main()
{
std::promise<std::string> promise; //– (1)
std::thread thread(execute, std::ref(promise)); //– (2)
std::future<std::string> result(promise.get_future()); //– (4)
std::cout << “result: ” << result.get() << std::endl; //– (5)
}

After including the header <futures> where the std::promise objects are declared, a specialized promise object is declared for the value it must preserve, std::string (1). The std::promise object creates a shared state internally, which is used to save the value corresponding to the type std::string, and which is being used by the std::future object to get this value by running the thread.
This promise is then passed as a parameter to the function of a separate thread (2). The moment the value of the promise object is set (3) inside the thread, the shared state becomes ready by default. To get the value set in the execute function it is necessary to use a std::future object that shares the same state with the std::promise object (4). Once the future object is created, its value can be obtained by calling get() method (5). It is important to note that the current thread (main thread) remains blocked until the shared state is ready (when the executed set_value method is executed (3)), meaning the data is available. The usage of such objects as std::promise is not exclusively particular to multi-threading programming. They can also be used in applications with a single thread, to keep a value or an exception to be processed later through a std::future.

Atomics

In addition to the mutual exclusion mechanisms above, the C++11 Standard introduces also the atomic types. An atomic type std::atomic <T> can be used with any T type and ensures that any operation involving the std::atomic <T> object will be atomic, meaning that it will be executed entirety or not at all. One of the advantages of using atomic types for mutual exclusion is performance, as in this case a lock -free technique is used, which is a much more economical approach than using a mutex. The latter can be relatively expensive in terms of resources and latency due to the mutual exclusion.

The main operations provided by the std::atomic class are the store and load functions, which set and return the atomic values stored in the std::atomic object. Another method specific to these objects is the exchange function, which sets a new value for the atomic object while returning the previously set value. Two more methods are also available, compare_exchange_weak and compare_exchange_strong, which perform atomic changes only if the current value is equal to the actual expected value. These last two functions can be used to implement lock-free algorithms. For example:

#include <atomic>

std::atomic<int> counter = 0; //– (1)

void increment()
{
++counter;   //– (2)
}

int query()
{
return counter.load();
}

In this example the <atomic> header will be included first where the template class std::atomic<> is declared. Then an atomic counter object is declared (1). Basically one can use any trivial, integral, or pointer type as a parameter for the template. Note, however, the std::atomic<int> object initialization, which must always be initialized because the default constructor does not initialize it completely. Unlike the example presented in the Mutex section, the counter variable can be incremented directly in this case, without the need to use mutex (2), since both the member functions of the std::atomic object and trivial operations such as assignments, automatic conversions, automatic increment, decrement are guaranteed to be run atomically. It is advisable to use atomic types when one wants to use atomic operations, especially on integral types.

Conclusions

In the previous sections we have outlined how the threads in the C++11 Standard can be used, covering both the aspects of the thread management and the mechanisms used to synchronize the data and the operations using mutexes, condition variables, futures, promises, packed tasks, and atomic types. As it can be seen, using threads from C++ Standard Library is not difficult and it will basically use the same mechanisms as the threads from the Boost library. However, the complexity increases with the complexity of the code design, which must behave as expected. For a better grasp of the topics above and expanding knowledge relating to new concepts available in the C++11 Standard, I highly recommend the book by Anthony Williams , C++ Concurrency in Action, and the latest edition of the classic The C++ Standard Library, by Nicolai Josuttis. You will find there not only a breakdown of the topics presented above, but also other new features specific to the C++11 Standard, including techniques for using them to perform the multi-threading programming at an advanced level.