Shell scripting with Node.js
You can buy the offline version of this book (HTML, PDF, EPUB, MOBI) and support the free online version.
(Ad, please don’t block.)

4 An overview of Node.js: architecture, APIs, event loop, concurrency



This chapter gives an overview of how Node.js works:

4.1 The Node.js platform

The following diagram provides an overview of how Node.js is structured:

The APIs available to a Node.js app consist of:

The Node.js APIs are partially implemented in JavaScript, partially in C++. The latter is needed to interface with the operating system.

Node.js runs JavaScript via an embedded V8 JavaScript engine (the same engine used by Google’s Chrome browser).

4.1.1 Global Node.js variables

These are a few highlights of Node’s global variables:

More global variables are mentioned throughout this chapter.

4.1.1.1 Using modules instead of global variables

The following built-in modules provide alternatives to global variables:

In principle, using modules is cleaner than using global variables. However, using the global variables console and process are such established patterns that deviating from them also has downsides.

4.1.2 The built-in Node.js modules

Most of Node’s APIs are provided via modules. These are a few frequently used ones (in alphabetical order):

Module 'node:module' contains function builtinModules() which returns an Array with the specifiers of all built-in modules:

import * as assert from 'node:assert/strict';
import {builtinModules} from 'node:module';
// Remove internal modules (whose names start with underscores)
const modules = builtinModules.filter(m => !m.startsWith('_'));
modules.sort();
assert.deepEqual(
  modules.slice(0, 5),
  [
    'assert',
    'assert/strict',
    'async_hooks',
    'buffer',
    'child_process',
  ]
);

4.1.3 The different styles of Node.js functions

In this section, we use the following import:

import * as fs from 'node:fs';

Node’s functions come in three different styles. Let’s look at the built-in module 'node:fs' as an example:

The three examples we have just seen, demonstrate the naming convention for functions with similar functionality:

Let’s take a closer look at how these three styles work.

4.1.3.1 Synchronous functions

Synchronous functions are simplest – they immediately return values and throw errors as exceptions:

try {
  const result = fs.readFileSync('/etc/passwd', {encoding: 'utf-8'});
  console.log(result);
} catch (err) {
  console.error(err);
}
4.1.3.2 Promise-based functions

Promise-based functions return Promises that are fulfilled with results and rejected with errors:

import * as fsPromises from 'node:fs/promises'; // (A)

try {
  const result = await fsPromises.readFile(
    '/etc/passwd', {encoding: 'utf-8'});
  console.log(result);
} catch (err) {
  console.error(err);
}

Note the module specifier in line A: The Promise-based API is located in a different module.

Promises are explained in more detail in “JavaScript for impatient programmers”.

4.1.3.3 Callback-based functions

Callback-based functions pass results and errors to callbacks which are their last parameters:

fs.readFile('/etc/passwd', {encoding: 'utf-8'},
  (err, result) => {
    if (err) {
      console.error(err);
      return;
    }
    console.log(result);
  }
);

This style is explained in more detail in the Node.js documentation.

4.2 The Node.js event loop

By default, Node.js executes all JavaScript in a single thread, the main thread. The main thread continuously runs the event loop – a loop that executes chunks of JavaScript. Each chunk is a callback and can be considered a cooperatively scheduled task. The first task contains the code (coming from a module or standard input) that we start Node.js with. Other tasks are usually added later, due to:

A first approximation of the event loop looks like this:

That is, the main thread runs code similar to:

while (true) { // event loop
  const task = taskQueue.dequeue(); // blocks
  task();
}

The event loop takes callbacks out of a task queue and executes them in the main thread. Dequeuing blocks (pauses the main thread) if the task queue is empty.

We’ll explore two topics later:

Why is this loop called event loop? Many tasks are added in response to events, e.g. ones sent by the operating system when input data is ready to be processed.

How are callbacks added to the task queue? These are common possibilities:

The following code shows an asynchronous callback-based operation in action. It reads a text file from the file system:

import * as fs from 'node:fs';

function handleResult(err, result) {
  if (err) {
    console.error(err);
    return;
  }
  console.log(result); // (A)
}
fs.readFile('reminder.txt', 'utf-8',
  handleResult
);
console.log('AFTER'); // (B)

This is the ouput:

AFTER
Don’t forget!

fs.readFile() executes the code that reads the file in another thread. In this case, the code succeeds and adds this callback to the task queue:

() => handleResult(null, 'Don’t forget!')

4.2.1 Running to completion makes code simpler

An important rule for how Node.js runs JavaScript code is: Each task finishes (“runs to completion”) before other tasks run. We can see that in the previous example: 'AFTER' in line B is logged before the result is logged in line A because the initial task finishes before the task with the invocation of handleResult() runs.

Running to completion means that task lifetimes don’t overlap and we don’t have to worry about shared data being changed in the background. That simplifies Node.js code. The next example demonstrates that. It implements a simple HTTP server:

// server.mjs
import * as http from 'node:http';

let requestCount = 1;
const server = http.createServer(
  (_req, res) => { // (A)
    res.writeHead(200);
    res.end('This is request number ' + requestCount); // (B)
    requestCount++; // (C)
  }
);
server.listen(8080);

We run this code via node server.mjs. After that, the code starts and waits for HTTP requests. We can send them by using a web browser to go to http://localhost:8080. Each time we reload that HTTP resource, Node.js invokes the callback that starts in line A. It serves a message with the current value of variable requestCount (line B) and increments it (line C).

Each invocation of the callback is a new task and variable requestCount is shared between tasks. Due to running to completion, it is easy to read and update. There is no need to synchronize with other concurrently running tasks because there aren’t any.

4.2.2 Why does Node.js code run in a single thread?

Why does Node.js code run in a single thread (with an event loop) by default? That has two benefits:

Given that some of Node’s asynchronous operations run in threads other than the main thread (more on that soon) and report back to JavaScript via the task queue, Node.js is not really single-threaded. Instead, we use a single thread to coordinate operations that run concurrently and asynchronously (in the main thread).

This concludes our first look at the event loop. Feel free to skip the remainder of this section if a superficial explanation is enough for you. Read on to learn more details.

4.2.3 The real event loop has multiple phases

The real event loop has multiple task queues from which it reads in multiple phases (you can check out some of the JavaScript code in the GitHub repository nodejs/node). The following diagram shows the most important ones of those phases:

What do the event loop phases do that are shown in the diagram?

Each phase runs until its queue is empty or until a maximum number of tasks was processed. Except for “poll”, each phase waits until its next turn before it processes tasks that were added during its run.

4.2.3.1 Phase “poll”

If this phase takes longer than a system-dependent time limit, it ends and the next phase runs.

4.2.4 Next-tick tasks and microtasks

After each invoked task, a “sub-loop” runs that consists of two phases:

The sub-phases handle:

Next-tick tasks are Node.js-specific, Microtasks are a cross-platform web standard (see MDN’s support table).

This sub-loop runs until both queues are empty. Tasks added during its run, are processed immediately – the sub-loop does not wait until its next turn.

4.2.5 Comparing different ways of directly scheduling tasks

We can use the following functions and methods to add callbacks to one of the task queues:

It’s important to note that when timing a task via a delay, we are specifying the earliest possible time that the task will run. Node.js cannot always run them at exactly the scheduled time because it can only check between tasks if any timed tasks are due. Therefore, a long-running task can cause timed tasks to be late.

4.2.5.1 Next-tick tasks and microtasks vs. normal tasks

Consider the following code:

function enqueueTasks() {
  Promise.resolve().then(() => console.log('Promise reaction 1'));
  queueMicrotask(() => console.log('queueMicrotask 1'));
  process.nextTick(() => console.log('nextTick 1'));
  setImmediate(() => console.log('setImmediate 1')); // (A)
  setTimeout(() => console.log('setTimeout 1'), 0);
  
  Promise.resolve().then(() => console.log('Promise reaction 2'));
  queueMicrotask(() => console.log('queueMicrotask 2'));
  process.nextTick(() => console.log('nextTick 2'));
  setImmediate(() => console.log('setImmediate 2')); // (B)
  setTimeout(() => console.log('setTimeout 2'), 0);
}

setImmediate(enqueueTasks);

We use setImmediate() to avoid a pecularity of ESM modules: They are executed in microtasks, which means that if we enqueue microtasks at the top level of an ESM module, they run before next-tick tasks. As we’ll see next, that’s different in most other contexts.

This is the output of the previous code:

nextTick 1
nextTick 2
Promise reaction 1
queueMicrotask 1
Promise reaction 2
queueMicrotask 2
setTimeout 1
setTimeout 2
setImmediate 1
setImmediate 2

Observations:

4.2.5.2 Enqueuing next-tick tasks and microtasks during their phases

The next code examines what happens if we enqueue a next-tick task during the next-tick phase and a microtask during the microtask phase:

setImmediate(() => {
  setImmediate(() => console.log('setImmediate 1'));
  setTimeout(() => console.log('setTimeout 1'), 0);

  process.nextTick(() => {
    console.log('nextTick 1');
    process.nextTick(() => console.log('nextTick 2'));
  });

  queueMicrotask(() => {
    console.log('queueMicrotask 1');
    queueMicrotask(() => console.log('queueMicrotask 2'));
    process.nextTick(() => console.log('nextTick 3'));
  });
});

This is the output:

nextTick 1
nextTick 2
queueMicrotask 1
queueMicrotask 2
nextTick 3
setTimeout 1
setImmediate 1

Observations:

4.2.5.3 Starving out event loop phases

The following code explores which kinds of tasks can starve out event loop phases (prevent them from running via infinite recursion):

import * as fs from 'node:fs/promises';

function timers() { // OK
  setTimeout(() => timers(), 0);
}
function immediate() { // OK
  setImmediate(() => immediate());
}

function nextTick() { // starves I/O
  process.nextTick(() => nextTick());
}

function microtasks() { // starves I/O
  queueMicrotask(() => microtasks());
}

timers();
console.log('AFTER'); // always logged
console.log(await fs.readFile('./file.txt', 'utf-8'));

The “timers” phase and the immediate phase don’t execute tasks that are enqueued during their phases. That’s why timers() and immediate() don’t starve out fs.readFile() which reports back during the “poll” phase (there is also a Promise reaction, but let’s ignore that here).

Due to how next-tick tasks and microtasks are scheduled, both nextTick() and microtasks() prevent the output in the last line.

4.2.6 When does a Node.js app exit?

At the end of each iteration of the event loop, Node.js checks if it’s time to exit. It keeps a reference count of pending timeouts (for timed tasks):

If the reference count is zero at the end of an event loop iteration, Node.js exits.

We can see that in the following example:

function timeout(ms) {
  return new Promise(
    (resolve, _reject) => {
      setTimeout(resolve, ms); // (A)
    }
  );
}
await timeout(3_000);

Node.js waits until the Promise returned by timeout() is fulfilled. Why? Because the task we schedule in line A keeps the event loop alive.

In contrast, creating Promises does not increase the reference count:

function foreverPending() {
  return new Promise(
    (_resolve, _reject) => {}
  );
}
await foreverPending(); // (A)

In this case, execution temporarily leaves this (main) task during await in line A. At the end of the event loop, the reference count is zero and Node.js exits. However, the exit is not successful. That is, the exit code is not 0, it is 13 (“Unfinished Top-Level Await”).

We can manually control whether a timeout keeps the event loop alive: By default, tasks scheduled via setImmediate(), setInterval(), and setTimeout() keep the event loop alive as long as they are pending. These functions return instances of class Timeout whose method .unref() changes that default so that the timeout being active won’t prevent Node.js from exiting. Method .ref() restores the default.

Tim Perry mentions a use case for .unref(): His library used setInterval() to repeatedly run a background task. That task prevented applications from exiting. He fixed the issue via .unref().

4.3 libuv: the cross-platform library that handles asynchronous I/O (and more) for Node.js

libuv is a library written in C that supports many platforms (Windows, macOS, Linux, etc.). Node.js uses it to handle I/O and more.

4.3.1 How libuv handles asynchronous I/O

Network I/O is asynchronous and doesn’t block the current thread. Such I/O includes:

To handle asynchronous I/O, libuv uses native kernel APIs and subscribes to I/O events (epoll on Linux; kqueue on BSD Unix incl. macOS; event ports on SunOS; IOCP on Windows). It then gets notifications when they occur. All of these activities, including the I/O itself, happen on the main thread.

4.3.2 How libuv handles blocking I/O

Some native I/O APIs are blocking (not asynchronous) – for example, file I/O and some DNS services. libuv invokes these APIs from threads in a thread pool (the so-called “worker pool”). That enables the main thread to use these APIs asynchronously.

4.3.3 libuv functionality beyond I/O

libuv helps Node.js with more than just with I/O. Other functionality includes:

As an aside, libuv has its own event loop whose source code you can check out in the GitHub repository libuv/libuv (function uv_run()).

4.4 Escaping the main thread with user code

If we want to keep Node.js responsive to I/O, we should avoid performing long-running computations in main-thread tasks. There are two options for doing so:

The next subsections cover a few options for offloading.

4.4.1 Worker threads

Worker Threads implement the cross-platform Web Workers API with a few differences – e.g.:

On one hand, Worker Threads really are threads: They are more lightweight than processes and run in the same process as the main thread.

On the other hand:

For more information, see the Node.js documentation on worker threads.

4.4.2 Clusters

Cluster is a Node.js-specific API. It lets us run clusters of Node.js processes that we can use to distribute workloads. The processes are fully isolated but share server ports. They can communicate by passing JSON data over channels.

If we don’t need process isolation, we can use Worker Threads which are more lightweight.

4.4.3 Child processes

Child process is another Node.js-specific API. It lets us spawn new processes that run native commands (often via native shells). This API is covered in §12 “Running shell commands in child processes”.

4.5 Sources of this chapter

Node.js event loop:

Videos on the event loop (which refresh some of the background knowledge needed for this chapter):

libuv:

JavaScript concurrency:

4.5.1 Acknowledgement