How to make threads with std::execution::par_unseq thread-safe?

I am reading C++ concurrency in action. It says that when you use std::execution::par, you can use mutex per internal element like below.

#include <mutex>
#include <vector>

class X {
  mutable std::mutex m;
  int data;

 public:
  X() : data(0) {}
  int get_value() const {
    std::lock_guard guard(m);
    return data;
  }
  void increment() {
    std::lock_guard guard(m);
    ++data;
  }
};

void increment_all(std::vector<X>& v) {
  std::for_each(v.begin(), v.end(), [](X& x) { x.increment(); });
}

But it says that when you use std::execution::par_unseq, you have to replace this mutex with a whole-container mutex like below

#include <mutex>
#include <vector>

class Y {
  int data;

 public:
  Y() : data(0) {}
  int get_value() const { return data; }
  void increment() { ++data; }
};

class ProtectedY {
  std::mutex m;
  std::vector<Y> v;

 public:
  void lock() { m.lock(); }
  void unlock() { m.unlock(); }

  std::vector<Y>& get_vec() { return v; }
};

void incremental_all(ProtectedY& data) {
  std::lock_guard<ProtectedY> guard(data);
  auto& v = data.get_vec();
  std::for_each(std::execution::par_unseq, v.begin(), v.end(),
                [](Y& y) { y.increment(); });
}

But even if you use second version, y.increament() in parallel algorithm thread also has data race condition because there is no lock among parallel algorithm threads.

How does this second version with std::execution::par_unseq can be thread safe ?

Answer

It is only thread-safe because you do not access shared data in the parallel algorithm.

The only thing being executed in parallel are the calls to y.increment(). These can happen in any order, on any thread and be arbitrarily interleaved with each other, even within a single thread. But y.increment() only accesses private data of y, and each y is distinct from all the other vector elements. So there is no opportunity for data races here, because there is no “overlap” between the individual elements.

A different example would be if the increment function also accesses some global state that is being shared between all the different elements of the vector. In that case, there is now a potential for a data race, so access to that shared global state needs to be synchronized. But because of the specific requirements of the parallel unsequenced policy, you can’t just use a mutex for synchronizing here.

Note that if a mutex is being used in the context of parallel algorithms it may protect against different hazards: One use is using a mutex to synchronize among the different threads executing the for-each. This works for the parallel execution policy, but not for parallel-unsequenced. This is not the use case in your examples, as in your case no data is shared, so we don’t need any synchronization. Instead in your examples the mutex only synchronizes the invocation of the for-each against any other threads that might still be running as part of a larger application, but there is no synchronization within the for-each itself. This is a valid use case for both parallel and parallel-unsequenced, but in the latter case, it cannot be achieved by using per-element mutexes.