WebAssembly: thread-safety and C/C++ local variables

I’m trying to understand the WebAssembly memory model, specially from the perspective of: what kind of risks I’m exposed to when sharing linear memory between WebAssembly instances? The basic memory model that all C/C++ => wasm tutorials gives us is as follow (the stack starts as __heap_base - 1 and grows downwards):

+-----------------------------------------------+
| ? | static data |     stack     |     heap    |
+-----------------------------------------------+
^   ^             ^               ^             ^
|   |             |               |             |
0 __global_base  __data_end     __heap_base  MAX_MEMORY

But the following fact surprised me. From https://webassembly.org/docs/security/:

Local variables with unclear static scope (e.g. are used by the address-of operator, or are of type struct and returned by value) are stored in a separate user-addressable stack in linear memory at compile time. This is an isolated memory region with fixed maximum size that is zero initialized by default.

and from https://github.com/WebAssembly/design/blob/main/Rationale.md#locals:

C/C++ makes it possible to take the address of a function’s local values and pass this pointer to callees or to other threads. Since WebAssembly’s local variables are outside the address space, C/C++ compilers implement address-taken variables by creating a separate stack data structure within linear memory. This stack is sometimes called the “aliased” stack, since it is used for variables which may be pointed to by pointers.

In other words, the stack defined from __heap_base - 1 to __data_end is an implementation artifact of C/C++ compiled modules. The “WASM stack” lives outside the linear memory. It just happens that, when you take the address of a local (for example), the compiler stores it in the “aliased stack” instead so there’s an address to take.

Doesn’t this behavior open the door to new kind of very dangerous data races in case of using shared memory?

Imagine a piece of code like this:

int calculation(int param1, int param2)
{
    if (param1 == param2 * 2)
        ++param1;
    else
        ++param2;

    return param1 / 3 + param2;
}

Here, calculation is thread-safe. However, if I replace calculation by this equivalent form:

int calculation(int param1, int param2)
{
    int* param = param1 == param2 * 2 ? &param1 : &param2;

    ++*param;

    return param1 / 3 + param2;
}

Depending on compiler’s output, calculation could no longer be thread-safe in case param1 and/or param2 are stored on the aliased-stack, which lives on the linear memory, which could be shared among other instances if shared memory is enabled by the --features=atomics,bulk-memory --shared-memory flags.

So, in which exact situations can the compiler decide to store a local variable on the aliased-stack?

EDIT: I did some tests to verify, and I would like to know if I’m right on this. I stored, on the heap, the memory addresses of the first, the half and the last local variables of a function that use 16 unsigned local variables, and I print them out from javascript, and the difference between the lowest stored address to __heap_base was 32*3 bytes + padding, and not 32*16 + padding, which means that only the three variables whose memory address was taken was stored on the aliased-stack. Of course, these tests are not thread-safe because I’m storing the addresses of locals outside the function, but it illustrates the point: if, on a re-entrant function, I’m temporarily taking the address of a local for implementation convenience, and, because of its complexity, the compiler isn’t sure about what I’m trying to do, it could finally decide to store the local on the stack instead of changing its implementation, turning the function thread-unsafe.

Answer

In a multi-threaded setup, each thread will get its own stack into the shared memory. The stack pointer (the creation of it seems to be done by LLVM createSyntheticSymbols) is placed into a WebAssembly global variable. Currently these globals are used as a thread-local storage. That means that each thread has its own global variable.

At the start of the WebAssembly instance, the main thread will have its own global variable pointing to the main thread stack into the shared memory. If you start another thread, during its startup time, its global variable will point to another place into the shared memory, where the stack for this thread is placed.

The allocation of the stack seems to be done by Emscripten __pthread_create_js if the caller does not supply its own pointer. The allocation of variables into the current stack is done here with stackAlloc where:

global.get __stack_pointer

is getting the current thread stack pointer, subtracts the needed bytes (the stack grows down), aligns it to 16 bytes and then remembers the new value back into the global. That is all thread safe, because the global is only accessible from the thread itself.

About the pointers, yes, the compiler will place the variables that are pointer-accessed into an explicit stack. Currently the WebAssembly stack is not “walkable”, but there is a proposal to make it so. An explicit stack is additionally used by many implementations, to gain more fine-grained control over the stack usage (variables, structures and so on).

All of this “stuff” SHOULD (RFC 2119) be transparent for the developers. Meaning, it appears to just work.


Based on your comments: the WebAssembly standard at this time deals with the data races by the use of atomic instructions. The ordering of access for them is sequentially consistent. In the case of multi-threading, clearly the memory allocator MUST be thread safe. The use of the explicit thread dedicated stack by itself does not have to be (using globals is enough, as written up), because the stack memory is only managed by the thread itself. Check the threads proposal for the atomic instructions and the implementation status. It is allowed to use atomic instructions in unshared memory as well.


Some implementations MAY lock the whole memory when they do non-atomic access as well as for the atomic access. That is at least for the reason that the specification does not forbid higher memory access guaranties. That means that even if you create a race at some memory address, you will not manage to read inconsistent/teared values. However, this is just a possibility that SHOULD NOT be relied upon.