10 Concurrency

Modern CPUs have multiple cores, but the programs you have written so far use only one. Concurrency lets your program do several things at the same time — processing data on one thread while waiting for I/O on another, or splitting a large computation across cores to finish faster. Concurrency is powerful but treacherous: shared data accessed from multiple threads without coordination leads to data races — subtle, non-deterministic bugs that are among the hardest to diagnose. In this chapter you will learn to create threads, protect shared data with mutexes, coordinate threads with condition variables, use std::async for higher-level concurrency, update simple counters and flags with atomics, and avoid common pitfalls.

10.1 std::thread

std::thread (#include <thread>) creates a new thread of execution:

#include <iostream>
#include <thread>

void hello() {
    std::cout << "Hola from a thread!\n";
}

int main() {
    std::thread t(hello);  // start a new thread running hello()
    t.join();              // wait for it to finish

    return 0;
}
Hola from a thread!

t.join() blocks the calling thread until t finishes. You must either join() or detach() every thread before it is destroyed, or the program calls std::terminate.

10.1.1 Passing Arguments

You can pass arguments to the thread function:

#include <iostream>
#include <string>
#include <thread>

void play(const std::string& song, int track_num) {
    std::cout << "Track " << track_num << ": " << song << "\n";
}

int main() {
    std::thread t(play, "Hung Up", 1);
    t.join();

    return 0;
}
Track 1: Hung Up

Trap: Arguments are copied into the thread by default. If you need to pass by reference, use std::ref() from <functional>:

int count = 0;
std::thread t([](int& c) { c++; }, std::ref(count));
t.join();
// count is now 1

Without std::ref, the thread gets its own copy and count stays 0.

10.1.2 Detaching Threads

t.detach() lets the thread run independently. The thread continues even after the std::thread object is destroyed:

std::thread t(background_task);
t.detach();  // thread runs on its own
// Be careful: if t accesses local variables, they may be destroyed!

Trap: A detached thread has no way to report back. If it accesses variables from the creating scope, those variables may already be destroyed. Prefer join() unless you have a specific reason to detach.

10.1.3 Lambdas as Thread Functions

Lambdas are the most common way to write thread functions:

#include <iostream>
#include <thread>

int main() {
    int result = 0;

    std::thread t([&result]() {
        result = 42;
    });
    t.join();

    std::cout << "Result: " << result << "\n";

    return 0;
}
Result: 42

10.1.4 std::jthread (C++20)

std::jthread is std::thread with two upgrades. First, the destructor automatically joins the thread, so you never forget:

#include <iostream>
#include <thread>

int main() {
    std::jthread t([]() {
        std::cout << "Auto-joining thread\n";
    });
    // No need to call t.join() --- destructor handles it
    return 0;
}

Second, jthread cooperates with stop tokens: a built-in, type-safe way to ask a thread to stop politely. The thread function can take an optional std::stop_token parameter; the owning jthread exposes request_stop(), and the thread checks token.stop_requested() periodically:

#include <chrono>
#include <iostream>
#include <thread>

void worker(std::stop_token token, int id) {
    while (!token.stop_requested()) {
        std::cout << "tick from " << id << "\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    std::cout << id << " stopping cleanly\n";
}

int main() {
    std::jthread w(worker, 1);
    std::this_thread::sleep_for(std::chrono::milliseconds(350));
    w.request_stop();
    // jthread destructor joins automatically
    return 0;
}

When main exits the jthread’s scope, the destructor calls request_stop() and join(), so the worker gets a chance to finish whatever it was doing before the program ends. That is a huge improvement over the bare std::thread model, where the destructor of an unjoined thread calls std::terminate.

Tip: Prefer std::jthread over std::thread in new code. The auto-join handles the most common bug (forgetting to join() before destruction), and the stop-token pattern is much cleaner than the old “atomic flag the worker polls” idiom.

Wut: A worker that calls a blocking operation (sleep_for, read, cv.wait) does not check stop_requested() while it blocks. For long blocks, use std::condition_variable_any::wait with a stop token, or break long sleeps into shorter polling intervals.

10.2 Mutexes and Locks

When two threads access the same data and at least one writes, you have a data race — undefined behavior. A mutex (mutual exclusion) prevents this by ensuring only one thread accesses the protected data at a time.

10.2.1 std::mutex

#include <iostream>
#include <mutex>
#include <thread>
#include <vector>

std::mutex mtx;
int shared_count = 0;

void increment(int times) {
    for (int i = 0; i < times; ++i) {
        mtx.lock();
        shared_count++;
        mtx.unlock();
    }
}

int main() {
    std::thread t1(increment, 100000);
    std::thread t2(increment, 100000);
    t1.join();
    t2.join();

    std::cout << "Count: " << shared_count << "\n";  // 200000

    return 0;
}
Count: 200000

Without the mutex, the count would be unpredictable — both threads could read the same value, increment it, and write back, losing an increment.

10.2.2 std::lock_guard

Calling lock() and unlock() manually is error-prone — if an exception is thrown between them, the mutex stays locked forever. std::lock_guard uses RAII (Chapter 9) to lock on construction and unlock on destruction:

void increment(int times) {
    for (int i = 0; i < times; ++i) {
        std::lock_guard<std::mutex> guard(mtx);
        shared_count++;
        // mutex unlocked when guard goes out of scope
    }
}

Tip: Always use lock_guard (or unique_lock, scoped_lock) instead of calling lock()/unlock() directly. This is RAII applied to mutex locking.

10.2.3 std::unique_lock

std::unique_lock is like lock_guard but more flexible — you can unlock and relock it, and it is movable:

std::unique_lock<std::mutex> lock(mtx);
// ... critical section ...
lock.unlock();
// ... non-critical work ...
lock.lock();
// ... another critical section ...

unique_lock is required for condition variables (covered next).

10.2.4 std::scoped_lock (C++17)

std::scoped_lock can lock multiple mutexes at once without risking deadlock:

std::mutex mtx1, mtx2;

void transfer() {
    std::scoped_lock lock(mtx1, mtx2);  // locks both, deadlock-free
    // ... modify data protected by both mutexes ...
}

If you tried to lock them separately, two threads could each lock one mutex and wait for the other — a deadlock.

scoped_lock uses a deadlock-avoidance algorithm internally. The standard does not mandate a particular strategy, but one common implementation tries each mutex in turn, and if any of them is already held by another thread, it releases the ones it holds, waits, and retries. Whatever the strategy, two threads calling transfer() from opposite directions cannot deadlock — one of them will back off and let the other go first.

If you need to lock the mutexes yourself first (for example with std::lock, which uses the same deadlock-avoidance trick), you can hand ownership to a scoped_lock using the std::adopt_lock tag:

std::lock(mtx1, mtx2);                               // lock both, deadlock-free
std::scoped_lock lock(std::adopt_lock, mtx1, mtx2);  // adopt; unlocks at scope exit

Tip: When you need to lock more than one mutex, always use scoped_lock (or pre-C++17, std::lock plus std::lock_guard). Hand-rolled “lock A then lock B” is the textbook way to introduce a deadlock that only manifests under load.

10.3 Condition Variables

A condition variable lets one thread wait until another thread signals that something has happened. The classic use is the producer-consumer pattern:

#include <condition_variable>
#include <iostream>
#include <mutex>
#include <queue>
#include <string>
#include <thread>
#include <vector>

std::queue<std::string> work_queue;
std::mutex mtx;
std::condition_variable cv;
bool done = false;

void producer() {
    std::vector<std::string> songs = {"Toxic", "Crazy", "Rehab"};
    for (const auto& song : songs) {
        {
            std::lock_guard<std::mutex> lock(mtx);
            work_queue.push(song);
        }
        cv.notify_one();
    }
    {
        std::lock_guard<std::mutex> lock(mtx);
        done = true;
    }
    cv.notify_one();
}

void consumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        cv.wait(lock, []() { return !work_queue.empty() || done; });

        while (!work_queue.empty()) {
            std::cout << "Processing: " << work_queue.front() << "\n";
            work_queue.pop();
        }

        if (done) break;
    }
}

int main() {
    std::thread prod(producer);
    std::thread cons(consumer);

    prod.join();
    cons.join();

    return 0;
}
Processing: Toxic
Processing: Crazy
Processing: Rehab

Key points about condition variables:

  • cv.wait(lock, predicate) unlocks the mutex, sleeps until notified, relocks, and checks the predicate. If the predicate is false, it goes back to sleep.
  • Always use the predicate form of wait to handle spurious wakeups — the OS can wake a thread without a notification.
  • notify_one() wakes one waiting thread; notify_all() wakes all of them.

Wut: Condition variables can experience spurious wakeups — the thread wakes up even though no one called notify. This is why you must always use the predicate form: cv.wait(lock, predicate). Without it, your thread may proceed when it should still be waiting.

10.4 std::async and std::future

Threads are low-level. std::async (#include <future>) provides a higher-level way to run a task asynchronously and get its result:

#include <future>
#include <iostream>
#include <string>

std::string fetch_lyrics(const std::string& song) {
    // Simulate slow operation
    return "Lyrics for: " + song;
}

int main() {
    // Start async task
    std::future<std::string> result = std::async(std::launch::async,
        fetch_lyrics, "Somewhere Only We Know");

    std::cout << "Doing other work...\n";

    // Get the result (blocks if not ready yet)
    std::cout << result.get() << "\n";

    return 0;
}
Doing other work...
Lyrics for: Somewhere Only We Know

std::async returns a std::future that holds the result. Calling future.get() blocks until the result is ready and returns it. If the async task threw an exception, get() rethrows it.

10.4.1 Launch Policies

Policy Behavior
std::launch::async Guaranteed new thread
std::launch::deferred Runs when get() is called (lazy)
Default (both) Implementation chooses

Tip: std::async is the easiest way to parallelize independent tasks. Use it when you need a result from a background computation. Use std::thread when you need more control over the thread’s lifetime.

10.5 Atomics

For simple operations on shared variables (counters, flags), mutexes are overkill. Atomic operations guarantee that reads and writes happen indivisibly, without locks:

#include <atomic>
#include <iostream>
#include <thread>
#include <vector>

std::atomic<int> counter(0);

void count_up(int times) {
    for (int i = 0; i < times; ++i) {
        counter++;  // atomic increment --- no mutex needed
    }
}

int main() {
    std::vector<std::thread> threads;
    for (int i = 0; i < 4; ++i) {
        threads.emplace_back(count_up, 100000);
    }
    for (auto& t : threads) {
        t.join();
    }

    std::cout << "Counter: " << counter << "\n";  // 400000

    return 0;
}
Counter: 400000

Common atomic operations:

std::atomic<int> a(0);
a.store(5);                // write
int x = a.load();          // read
int old = a.exchange(10);  // swap and return old value
a.fetch_add(1);            // atomic increment
a.fetch_sub(1);            // atomic decrement

Tip: Use std::atomic for simple shared counters and flags. For anything more complex (protecting multiple variables or a data structure), use a mutex.

10.6 Thread Safety Pitfalls

10.6.1 Data Races

A data race occurs when two threads access the same variable, at least one writes, and there is no synchronization. Data races are undefined behavior — anything can happen:

// BUG: data race on 'count'
int count = 0;

void bad_increment() {
    for (int i = 0; i < 100000; ++i) {
        count++;  // not atomic, not protected
    }
}

Fix with a mutex, lock_guard, or std::atomic<int>.

10.6.2 Deadlock

A deadlock occurs when two threads each hold a lock the other needs:

// Thread 1: lock A, then lock B
// Thread 2: lock B, then lock A
// Both wait forever!

Prevention strategies:

  • Always lock mutexes in the same order.
  • Use std::scoped_lock to lock multiple mutexes simultaneously.
  • Minimize the time you hold locks.

10.6.3 False Sharing

When two threads modify different variables that happen to share a cache line, the CPU invalidates the cache on every write, destroying performance. This is false sharing:

// Both on the same cache line --- slow!
struct Counters {
    int a;  // thread 1 writes this
    int b;  // thread 2 writes this
};

Fix by padding each field onto its own cache line. The hard-coded 64 below works on most current x86-64 CPUs, but cache-line size varies by architecture (some ARM CPUs are 128 bytes, some older chips are 32):

struct alignas(64) Counters {
    alignas(64) int a;
    alignas(64) int b;
};

For portable code, use std::hardware_destructive_interference_size from <new> (C++17):

#include <new>

struct Counters {
    alignas(std::hardware_destructive_interference_size) int a;
    alignas(std::hardware_destructive_interference_size) int b;
};

The constant is the implementation’s estimate of how far apart two objects must be so that writes to one do not interfere with reads of the other — in practice, the cache-line size. Support varies: libc++ did not provide the constant until LLVM 19, and GCC defines it but warns (-Winterference-size) when its value leaks into ABI-sensitive places like public headers, because the value can change between compiler versions and -march flags. If your toolchain lacks it, fall back to 64 (or your platform’s known size) with a comment explaining why.

Wut: alignas(64) on a member forces the start of that member to land on a 64-byte boundary. Aligning b therefore makes the compiler insert padding after a, pushing b onto its own cache line. You do not need alignas on the type as well: a struct’s alignment is automatically raised to match its most-aligned member, so aligning each member you want isolated is enough.

10.7 Try It: Concurrent Counter

Here is a program that demonstrates threads, mutexes, and atomics. Type it in, compile with g++ -std=c++23 -pthread, and experiment:

#include <atomic>
#include <chrono>
#include <future>
#include <iostream>
#include <mutex>
#include <string>
#include <thread>
#include <vector>

// Shared state
std::mutex mtx;
int mutex_count = 0;
std::atomic<int> atomic_count(0);

void mutex_increment(int n) {
    for (int i = 0; i < n; ++i) {
        std::lock_guard<std::mutex> lock(mtx);
        mutex_count++;
    }
}

void atomic_increment(int n) {
    for (int i = 0; i < n; ++i) {
        atomic_count++;
    }
}

template<typename Func>
long long benchmark(Func f, int threads, int per_thread) {
    auto start = std::chrono::high_resolution_clock::now();

    std::vector<std::thread> workers;
    for (int i = 0; i < threads; ++i) {
        workers.emplace_back(f, per_thread);
    }
    for (auto& t : workers) {
        t.join();
    }

    auto end = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
}

int main() {
    const int threads = 4;
    const int per_thread = 1000000;

    auto ms1 = benchmark(mutex_increment, threads, per_thread);
    std::cout << "Mutex count:  " << mutex_count
              << " (" << ms1 << " ms)\n";

    auto ms2 = benchmark(atomic_increment, threads, per_thread);
    std::cout << "Atomic count: " << atomic_count
              << " (" << ms2 << " ms)\n";

    // std::async example
    auto fut = std::async(std::launch::async, []() {
        return std::string("Async result ready");
    });

    std::cout << fut.get() << "\n";

    return 0;
}

Compare the performance of mutex-based and atomic-based counting. Try removing the synchronization entirely and see how the count becomes incorrect.

10.8 Key Points

  • std::thread creates a new thread of execution. Every thread must be join()ed or detach()ed before destruction.
  • std::jthread (C++20) automatically joins on destruction.
  • Arguments are copied into threads by default; use std::ref() for references.
  • std::mutex provides mutual exclusion. Use std::lock_guard or std::scoped_lock (RAII) instead of manual lock()/unlock().
  • std::unique_lock is more flexible than lock_guard — it can be unlocked, relocked, and moved.
  • std::scoped_lock (C++17) locks multiple mutexes simultaneously to prevent deadlock.
  • Condition variables let threads wait for a signal. Always use the predicate form of wait to handle spurious wakeups.
  • std::async and std::future provide high-level asynchronous computation. future.get() blocks until the result is ready.
  • std::atomic provides lock-free operations for simple types (counters, flags).
  • Data races (unsynchronized access) are undefined behavior. Use mutexes, atomics, or other synchronization primitives.
  • Deadlocks occur when threads wait for each other’s locks. Lock in consistent order or use scoped_lock.

10.9 Exercises

  1. Think about it: Why must every std::thread be either joined or detached? What happens if you destroy a joinable thread?

  2. Where is the bug?

    int total = 0;
    
    void add(int n) { total += n; }
    
    std::thread t1(add, 100);
    std::thread t2(add, 200);
    t1.join();
    t2.join();
    
    std::cout << total << "\n";
  3. Think about it: Why does std::lock_guard not have unlock() and lock() methods, while std::unique_lock does? When would you need the extra flexibility?

  4. What does this program print?

    std::atomic<int> x(0);
    
    std::thread t1([&]() { x++; x++; x++; });
    std::thread t2([&]() { x++; x++; x++; });
    t1.join();
    t2.join();
    
    std::cout << x << "\n";
  5. Where is the deadlock?

    std::mutex m1, m2;
    
    void thread_a() {
        std::lock_guard<std::mutex> lock1(m1);
        std::lock_guard<std::mutex> lock2(m2);
        // ...
    }
    
    void thread_b() {
        std::lock_guard<std::mutex> lock1(m2);
        std::lock_guard<std::mutex> lock2(m1);
        // ...
    }

    How would you fix it?

  6. Think about it: When should you use std::async instead of creating a std::thread manually? What are the advantages?

  7. What value does result hold?

    auto fut = std::async(std::launch::deferred, []() { return 6 * 7; });
    // ... other work ...
    int result = fut.get();
  8. Where is the bug?

    std::mutex mtx;
    
    void process() {
        mtx.lock();
        if (some_condition()) {
            return;  // oops
        }
        // ... more work ...
        mtx.unlock();
    }
  9. Think about it: Why is std::atomic faster than using a mutex for simple counters? When would you still prefer a mutex over an atomic?

  10. Write a program that uses four threads to compute the sum of a large vector (1 million elements). Each thread should sum one quarter of the vector. Use std::async and std::future to collect the partial sums, then print the total.

  11. Where is the bug?

    #include <atomic>
    #include <chrono>
    #include <thread>
    
    std::atomic<bool> stop_flag{false};
    
    void worker() {
        while (!stop_flag) {
            std::this_thread::sleep_for(std::chrono::seconds(60));
        }
    }
    
    int main() {
        std::thread t(worker);
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        stop_flag = true;
        t.join();
        return 0;
    }

    The program waits almost a full minute before exiting. What is the conceptual problem with the polling design, and how would you rewrite the worker using std::jthread and a stop token?

  12. Think about it: std::scoped_lock(mtx1, mtx2) uses a deadlock-avoidance algorithm internally. Why is hand-rolled “lock mtx1, then lock mtx2” prone to deadlock when two threads call it from opposite sides? Sketch the interleaving that causes the deadlock and explain why scoped_lock cannot deadlock the same way.

  13. Calculate: A std::atomic<int> counter starts at 10. Eight threads each perform 25000 counter.fetch_add(1) calls, and all eight are joined. What is the final value of counter? Would the result be deterministic if counter were a plain int instead?