6. Shared memory and atomics
Table of contents
Please support this book: buy it (PDF, EPUB, MOBI) or donate
(Ad, please don’t block.)

6. Shared memory and atomics

The ECMAScript 2017 feature “Shared memory and atomics” was designed by Lars T. Hansen. It introduces a new constructor SharedArrayBuffer and a namespace object Atomics with helper functions. This chapter explains the details.

6.1 Parallelism vs. concurrency

Before we begin, let’s clarify two terms that are similar, yet distinct: “parallelism” and “concurrency”. Many definitions for them exist; I’m using them as follows:

Both are closely related, but not the same:

However, it is difficult to use these terms precisely, which is why interchanging them is usually not a problem.

6.1.1 Models of parallelism

Two models of parallelism are:

6.2 A history of JS parallelism

6.2.1 The next step: SharedArrayBuffer

What’s next? For low-level parallelism, the direction is quite clear: support SIMD and GPUs as well as possible. However, for high-level parallelism, things are much less clear, especially after the failure of PJS.

What is needed is a way to try out many approaches, to find out how to best bring high-level parallelism to JavaScript. Following the principles of the extensible web manifesto, the proposal “shared memory and atomics” (a.k.a. “Shared Array Buffers”) does so by providing low-level primitives that can be used to implement higher-level constructs.

6.3 Shared Array Buffers

Shared Array Buffers are a primitive building block for higher-level concurrency abstractions. They allow you to share the bytes of a SharedArrayBuffer object between multiple workers and the main thread (the buffer is shared, to access the bytes, wrap it in a Typed Array). This kind of sharing has two benefits:

6.3.1 Creating and sending a Shared Array Buffer

// main.js

const worker = new Worker('worker.js');

// To be shared
const sharedBuffer = new SharedArrayBuffer( // (A)
    10 * Int32Array.BYTES_PER_ELEMENT); // 10 elements

// Share sharedBuffer with the worker
worker.postMessage({sharedBuffer}); // clone

// Local only
const sharedArray = new Int32Array(sharedBuffer); // (B)

You create a Shared Array Buffer the same way you create a normal Array Buffer: by invoking the constructor and specifying the size of the buffer in bytes (line A). What you share with workers is the buffer. For your own, local, use, you normally wrap Shared Array Buffers in Typed Arrays (line B).

Warning: Cloning a Shared Array Buffer is the correct way of sharing it, but some engines still implement an older version of the API and require you to transfer it:

worker.postMessage({sharedBuffer}, [sharedBuffer]); // transfer (deprecated)

In the final version of the API, transferring a Shared Array Buffer means that you lose access to it.

6.3.2 Receiving a Shared Array Buffer

The implementation of the worker looks as follows.

// worker.js

self.addEventListener('message', function (event) {
    const {sharedBuffer} = event.data;
    const sharedArray = new Int32Array(sharedBuffer); // (A)

    // ···
});

We first extract the Shared Array Buffer that was sent to us and then wrap it in a Typed Array (line A), so that we can use it locally.

6.4 Atomics: safely accessing shared data

6.4.1 Problem: Optimizations make code unpredictable across workers

In single threads, compilers can make optimizations that break multi-threaded code.

Take, for example the following code:

while (sharedArray[0] === 123) ;

In a single thread, the value of sharedArray[0] never changes while the loop runs (if sharedArray is an Array or Typed Array that wasn’t patched in some manner). Therefore, the code can be optimized as follows:

const tmp = sharedArray[0];
while (tmp === 123) ;

However, in a multi-threaded setting, this optimization prevents us from using this pattern to wait for changes made in another thread.

Another example is the following code:

// main.js
sharedArray[1] = 11;
sharedArray[2] = 22;

In a single thread, you can rearrange these write operations, because nothing is read in-between. For multiple threads, you get into trouble whenever you expect the writes to be done in a specific order:

// worker.js
while (sharedArray[2] !== 22) ;
console.log(sharedArray[1]); // 0 or 11

These kinds of optimizations make it virtually impossible to synchronize the activity of multiple workers operating on the same Shared Array Buffer.

6.4.2 Solution: atomics

The proposal provides the global variable Atomics whose methods have three main use cases.

6.4.2.1 Use case: synchronization

Atomics methods can be used to synchronize with other workers. For example, the following two operations let you read and write data and are never rearranged by compilers:

The idea is to use normal operations to read and write most data, while Atomics operations (load, store and others) ensure that the reading and writing is done safely. Often, you’ll use custom synchronization mechanisms, such as locks, whose implementations are based on Atomics.

This is a very simple example that always works, thanks to Atomics (I’ve omitted setting up sharedArray):

// main.js
console.log('notifying...');
Atomics.store(sharedArray, 0, 123);

// worker.js
while (Atomics.load(sharedArray, 0) !== 123) ;
console.log('notified');
6.4.2.2 Use case: waiting to be notified

Using a while loop to wait for a notification is not very efficient, which is why Atomics has operations that help:

6.4.2.3 Use case: atomic operations

Several Atomics operations perform arithmetic and can’t be interrupted while doing so, which helps with synchronization. For example:

Roughly, this operation performs:

ta[index] += value;

6.4.3 Problem: torn values

Another problematic effect with shared memory is torn values (garbage): when reading, you may see an intermediate value – neither the value before a new value was written to memory nor the new value.

Sect “Tear-Free Reads” in the spec states that there is no tear if and only if:

In other words, torn values are an issue whenever the same Shared Array Buffer is accessed via:

To avoid torn values in these cases, use Atomics or synchronize.

6.5 Shared Array Buffers in use

6.5.1 Shared Array Buffers and the run-to-completion semantics of JavaScript

JavaScript has so-called run-to-completion semantics: every function can rely on not being interrupted by another thread until it is finished. Functions become transactions and can perform complete algorithms without anyone seeing the data they operate on in an intermediate state.

Shared Array Buffers break run to completion (RTC): data a function is working on can be changed by another thread during the runtime of the function. However, code has complete control over whether or not this violation of RTC happens: if it doesn’t use Shared Array Buffers, it is safe.

This is loosely similar to how async functions violate RTC. There, you opt into a blocking operation via the keyword await.

6.5.2 Shared Array Buffers and asm.js and WebAssembly

Shared Array Buffers enable emscripten to compile pthreads to asm.js. Quoting an emscripten documentation page:

[Shared Array Buffers allow] Emscripten applications to share the main memory heap between web workers. This along with primitives for low level atomics and futex support enables Emscripten to implement support for the Pthreads (POSIX threads) API.

That is, you can compile multithreaded C and C++ code to asm.js.

Discussion on how to best bring multi-threading to WebAssembly is ongoing. Given that web workers are relatively heavyweight, it is possible that WebAssembly will introduce lightweight threads. You can also see that threads are on the roadmap for WebAssembly’s future.

6.5.3 Sharing data other than integers

At the moment, only Arrays of integers (up to 32 bits long) can be shared. That means that the only way of sharing other kinds of data is by encoding them as integers. Tools that may help include:

Eventually, there will probably be additional – higher-level – mechanisms for sharing data. And experiments will continue to figure out what these mechanisms should look like.

6.5.4 How much faster is code that uses Shared Array Buffers?

Lars T. Hansen has written two implementations of the Mandelbrot algorithm (as documented in his article “A Taste of JavaScript’s New Parallel Primitives” where you can try them out online): A serial version and a parallel version that uses multiple web workers. For up to 4 web workers (and therefore processor cores), speed-up improves almost linearly, from 6.9 frames per seconds (1 web worker) to 25.4 frames per seconds (4 web workers). More web workers bring additional performance improvements, but more modest ones.

Hansen notes that the speed-ups are impressive, but going parallel comes at the cost of the code being more complex.

6.6 Example

Let’s look at a more comprehensive example. Its code is available on GitHub, in the repository shared-array-buffer-demo. And you can run it online.

6.6.1 Using a shared lock

In the main thread, we set up shared memory so that it encodes a closed lock and send it to a worker (line A). Once the user clicks, we open the lock (line B).

// main.js

// Set up the shared memory
const sharedBuffer = new SharedArrayBuffer(
    1 * Int32Array.BYTES_PER_ELEMENT);
const sharedArray = new Int32Array(sharedBuffer);

// Set up the lock
Lock.initialize(sharedArray, 0);
const lock = new Lock(sharedArray, 0);
lock.lock(); // writes to sharedBuffer

worker.postMessage({sharedBuffer}); // (A)

document.getElementById('unlock').addEventListener(
    'click', event => {
        event.preventDefault();
        lock.unlock(); // (B)
    });

In the worker, we set up a local version of the lock (whose state is shared with the main thread via a Shared Array Buffer). In line B, we wait until the lock is unlocked. In lines A and C, we send text to the main thread, which displays it on the page for us (how it does that is not shown in the previous code fragment). That is, we are using self.postMessage() much like console.log() in these two lines.

// worker.js

self.addEventListener('message', function (event) {
    const {sharedBuffer} = event.data;
    const lock = new Lock(new Int32Array(sharedBuffer), 0);

    self.postMessage('Waiting for lock...'); // (A)
    lock.lock(); // (B) blocks!
    self.postMessage('Unlocked'); // (C)
});

It is noteworthy that waiting for the lock in line B stops the complete worker. That is real blocking, which hasn’t existed in JavaScript until now (await in async functions is an approximation).

6.6.2 Implementing a shared lock

Next, we’ll look at an ES6-ified version of a Lock implementation by Lars T. Hansen that is based on SharedArrayBuffer.

In this section, we’ll need (among others) the following Atomics function:

The implementation starts with a few constants and the constructor:

const UNLOCKED = 0;
const LOCKED_NO_WAITERS = 1;
const LOCKED_POSSIBLE_WAITERS = 2;

// Number of shared Int32 locations needed by the lock.
const NUMINTS = 1;

class Lock {

    /**
     * @param iab an Int32Array wrapping a SharedArrayBuffer
     * @param ibase an index inside iab, leaving enough room for NUMINTS
     */
    constructor(iab, ibase) {
        // OMITTED: check parameters
        this.iab = iab;
        this.ibase = ibase;
    }

The constructor mainly stores its parameters in instance properties.

The method for locking looks as follows.

/**
 * Acquire the lock, or block until we can. Locking is not recursive:
 * you must not hold the lock when calling this.
 */
lock() {
    const iab = this.iab;
    const stateIdx = this.ibase;
    var c;
    if ((c = Atomics.compareExchange(iab, stateIdx, // (A)
    UNLOCKED, LOCKED_NO_WAITERS)) !== UNLOCKED) {
        do {
            if (c === LOCKED_POSSIBLE_WAITERS // (B)
            || Atomics.compareExchange(iab, stateIdx,
            LOCKED_NO_WAITERS, LOCKED_POSSIBLE_WAITERS) !== UNLOCKED) {
                Atomics.wait(iab, stateIdx, // (C)
                    LOCKED_POSSIBLE_WAITERS, Number.POSITIVE_INFINITY);
            }
        } while ((c = Atomics.compareExchange(iab, stateIdx,
        UNLOCKED, LOCKED_POSSIBLE_WAITERS)) !== UNLOCKED);
    }
}

In line A, we change the lock to LOCKED_NO_WAITERS if its current value is UNLOCKED. We only enter the then-block if the lock is already locked (in which case compareExchange() did not change anything).

In line B (inside a do-while loop), we check if the lock is locked with waiters or not unlocked. Given that we are about to wait, the compareExchange() also switches to LOCKED_POSSIBLE_WAITERS if the current value is LOCKED_NO_WAITERS.

In line C, we wait if the lock value is LOCKED_POSSIBLE_WAITERS. The last parameter, Number.POSITIVE_INFINITY, means that waiting never times out.

After waking up, we continue the loop if we are not unlocked. compareExchange() also switches to LOCKED_POSSIBLE_WAITERS if the lock is UNLOCKED. We use LOCKED_POSSIBLE_WAITERS and not LOCKED_NO_WAITERS, because we need to restore this value after unlock() temporarily set it to UNLOCKED and woke us up.

The method for unlocking looks as follows.

    /**
     * Unlock a lock that is held.  Anyone can unlock a lock that
     * is held; nobody can unlock a lock that is not held.
     */
    unlock() {
        const iab = this.iab;
        const stateIdx = this.ibase;
        var v0 = Atomics.sub(iab, stateIdx, 1); // A

        // Wake up a waiter if there are any
        if (v0 !== LOCKED_NO_WAITERS) {
            Atomics.store(iab, stateIdx, UNLOCKED);
            Atomics.wake(iab, stateIdx, 1);
        }
    }

    // ···
}

In line A, v0 gets the value that iab[stateIdx] had before 1 was subtracted from it. The subtraction means that we go (e.g.) from LOCKED_NO_WAITERS to UNLOCKED and from LOCKED_POSSIBLE_WAITERS to LOCKED.

If the value was previously LOCKED_NO_WAITERS then it is now UNLOCKED and everything is fine (there is no one to wake up).

Otherwise, the value was either LOCKED_POSSIBLE_WAITERS or UNLOCKED. In the former case, we are now unlocked and must wake up someone (who will usually lock again). In the latter case, we must fix the illegal value created by subtraction and the wake() simply does nothing.

6.6.3 Conclusion for the example

This gives you a rough idea how locks based on SharedArrayBuffer work. Keep in mind that multithreaded code is notoriously difficult to write, because things can change at any time. Case in point: lock.js is based on a paper documenting a futex implementation for the Linux kernel. And the title of that paper is “Futexes are tricky” (PDF).

If you want to go deeper into parallel programming with Shared Array Buffers, take a look at synchronic.js and the document it is based on (PDF).

6.7 The API for shared memory and atomics

6.7.1 SharedArrayBuffer

Constructor:

Static property:

Instance properties:

6.7.2 Atomics

The main operand of Atomics functions must be an instance of Int8Array, Uint8Array, Int16Array, Uint16Array, Int32Array or Uint32Array. It must wrap a SharedArrayBuffer.

All functions perform their operations atomically. The ordering of store operations is fixed and can’t be reordered by compilers or CPUs.

6.7.2.1 Loading and storing
6.7.2.2 Simple modification of Typed Array elements

Each of the following functions changes a Typed Array element at a given index: It applies an operator to the element and a parameter and writes the result back to the element. It returns the original value of the element.

6.7.2.3 Waiting and waking

Waiting and waking requires the parameter ta to be an instance of Int32Array.

6.7.2.4 Miscellaneous

6.8 FAQ

6.8.1 What browsers support Shared Array Buffers?

At the moment, I’m aware of:

6.9 Further reading

More information on Shared Array Buffers and supporting technologies:

Other JavaScript technologies related to parallelism:

Background on parallelism:

Acknowledgement: I’m very grateful to Lars T. Hansen for reviewing this chapter and for answering my SharedArrayBuffer-related questions.

Next: 7. Object.entries() and Object.values()