This time i use atomic_fetch_add . Here is how i can get ra1=1 and ra2=1 . Both the threads see a.fetch_add(1,memory_order_relaxed); when a=0. The writes go into store buffer and isn’t visible to the other. Both of them have ra=1 and ra2=1.
I can reason how it prints 12,21 and 22.
- 22 is given by both of them incrementing a in foo and bar and a=2 is visible to both a.load.
- Similar 12 is given by thread foo completing and thread bar start after thread foo store.
- 21 is given by first bar then foo.
// g++ -O2 -pthread axbx.cpp ; while [ true ]; do ./a.out | grep "11"; done doesn't print 11 within 5 mins #include<atomic> #include<thread> #include<cstdio> using namespace std; atomic<long> a,b; long ra1,ra2; void foo(){ a.fetch_add(1,memory_order_relaxed); ra1=a.load(memory_order_relaxed); } void bar(){ a.fetch_add(1,memory_order_relaxed); ra2=a.load(memory_order_relaxed); } int main(){ thread t[2]{ thread(foo),thread(bar)}; t[0].join();t[1].join(); printf("%ld%ldn",ra1,ra2); // This doesn't print 11 but it should }
Answer
a.fetch_add
is atomic; that’s the whole point. There’s no way for two separate fetch_adds to step on each other and only result in a single increment.
Implementations that let the store buffer break that would not be correct implementations, because ISO C++ requires the entire RMW to be one atomic operation, not atomic-load and separate atomic-store.
(e.g. on x86, lock add [a], 1
is a full barrier because of how it has to be implemented: making sure the updated data is visible in L1d cache as part of executing. Can num++ be atomic for ‘int num’?.
On some other implementations, e.g. AArch64 before ARMv8.2, it will compile to an LL/SC retry loop, where the Store-Conditional will fail if this core lost exclusive ownership of the cache line between the load and store.)