This chapter gives an overview of how Node.js works:
The following diagram provides an overview of how Node.js is structured:
The APIs available to a Node.js app consist of:
fetch
and CompressionStream
fall into this category.process
.'node:path'
(functions and constants for handling file system paths) and 'node:fs'
(functionality related to the file system).The Node.js APIs are partially implemented in JavaScript, partially in C++. The latter is needed to interface with the operating system.
Node.js runs JavaScript via an embedded V8 JavaScript engine (the same engine used by Google’s Chrome browser).
These are a few highlights of Node’s global variables:
crypto
gives us access to a web-compatible crypto API.
console
has much overlap with the same global variable in browsers (console.log()
etc.).
fetch()
lets us use the Fetch browser API.
process
contains an instance of class Process
and gives us access to command line arguments, standard input, standard out, and more.
structuredClone()
is a browser-compatible function for cloning objects.
URL
is a browser-compatible class for handling URLs.
More global variables are mentioned throughout this chapter.
The following built-in modules provide alternatives to global variables:
'node:console'
is an alternative to the global variable console
:
console.log('Hello!');
import {log} from 'node:console';
log('Hello!');
'node:process'
is an alternative to the global variable process
:
console.log(process.argv);
import {argv} from 'node:process';
console.log(process.argv);
In principle, using modules is cleaner than using global variables. However, using the global variables console
and process
are such established patterns that deviating from them also has downsides.
Most of Node’s APIs are provided via modules. These are a few frequently used ones (in alphabetical order):
'node:assert/strict'
: Assertions are functions that check if a condition is met and report an error if not. They can be used in application code and for unit testing. This is an example of using this API:
import * as assert from 'node:assert/strict';
.equal(3 + 4, 7);
assert.equal('abc'.toUpperCase(), 'ABC');
assert
.deepEqual({prop: true}, {prop: true}); // deep comparison
assert.notEqual({prop: true}, {prop: true}); // shallow comparison assert
'node:child_process'
is for running native commands synchronously or in separate processes. This module is described in §12 “Running shell commands in child processes”.
'node:fs'
provides file system operations such as reading, writing, copying and deleting files and directories. For more information, see §8 “Working with the file system on Node.js”.
'node:os'
contains operating-system-specific constants and utility functions. Some of them are explained in §7 “Working with file system paths and file URLs on Node.js”.
'node:path'
is a cross-platform API for working with file system paths. It is described in §7 “Working with file system paths and file URLs on Node.js”.
'node:stream'
contains a Node.js-specific streams API which are explained in §9 “Native Node.js streams”.
'node:util'
contains various utility functions.
Module 'node:module'
contains function builtinModules()
which returns an Array with the specifiers of all built-in modules:
import * as assert from 'node:assert/strict';
import {builtinModules} from 'node:module';
// Remove internal modules (whose names start with underscores)
const modules = builtinModules.filter(m => !m.startsWith('_'));
.sort();
modules.deepEqual(
assert.slice(0, 5),
modules
['assert',
'assert/strict',
'async_hooks',
'buffer',
'child_process',
]; )
In this section, we use the following import:
import * as fs from 'node:fs';
Node’s functions come in three different styles. Let’s look at the built-in module 'node:fs'
as an example:
The three examples we have just seen, demonstrate the naming convention for functions with similar functionality:
fs.readFile()
fsPromises.readFile()
fs.readFileSync()
Let’s take a closer look at how these three styles work.
Synchronous functions are simplest – they immediately return values and throw errors as exceptions:
try {
const result = fs.readFileSync('/etc/passwd', {encoding: 'utf-8'});
console.log(result);
catch (err) {
} console.error(err);
}
Promise-based functions return Promises that are fulfilled with results and rejected with errors:
import * as fsPromises from 'node:fs/promises'; // (A)
try {
const result = await fsPromises.readFile(
'/etc/passwd', {encoding: 'utf-8'});
console.log(result);
catch (err) {
} console.error(err);
}
Note the module specifier in line A: The Promise-based API is located in a different module.
Promises are explained in more detail in “JavaScript for impatient programmers”.
Callback-based functions pass results and errors to callbacks which are their last parameters:
.readFile('/etc/passwd', {encoding: 'utf-8'},
fs, result) => {
(errif (err) {
console.error(err);
return;
}console.log(result);
}; )
This style is explained in more detail in the Node.js documentation.
By default, Node.js executes all JavaScript in a single thread, the main thread. The main thread continuously runs the event loop – a loop that executes chunks of JavaScript. Each chunk is a callback and can be considered a cooperatively scheduled task. The first task contains the code (coming from a module or standard input) that we start Node.js with. Other tasks are usually added later, due to:
A first approximation of the event loop looks like this:
That is, the main thread runs code similar to:
while (true) { // event loop
const task = taskQueue.dequeue(); // blocks
task();
}
The event loop takes callbacks out of a task queue and executes them in the main thread. Dequeuing blocks (pauses the main thread) if the task queue is empty.
We’ll explore two topics later:
Why is this loop called event loop? Many tasks are added in response to events, e.g. ones sent by the operating system when input data is ready to be processed.
How are callbacks added to the task queue? These are common possibilities:
The following code shows an asynchronous callback-based operation in action. It reads a text file from the file system:
import * as fs from 'node:fs';
function handleResult(err, result) {
if (err) {
console.error(err);
return;
}console.log(result); // (A)
}.readFile('reminder.txt', 'utf-8',
fs
handleResult;
)console.log('AFTER'); // (B)
This is the ouput:
AFTER
Don’t forget!
fs.readFile()
executes the code that reads the file in another thread. In this case, the code succeeds and adds this callback to the task queue:
=> handleResult(null, 'Don’t forget!') ()
An important rule for how Node.js runs JavaScript code is: Each task finishes (“runs to completion”) before other tasks run. We can see that in the previous example: 'AFTER'
in line B is logged before the result is logged in line A because the initial task finishes before the task with the invocation of handleResult()
runs.
Running to completion means that task lifetimes don’t overlap and we don’t have to worry about shared data being changed in the background. That simplifies Node.js code. The next example demonstrates that. It implements a simple HTTP server:
// server.mjs
import * as http from 'node:http';
let requestCount = 1;
const server = http.createServer(
, res) => { // (A)
(_req.writeHead(200);
res.end('This is request number ' + requestCount); // (B)
res++; // (C)
requestCount
};
).listen(8080); server
We run this code via node server.mjs
. After that, the code starts and waits for HTTP requests. We can send them by using a web browser to go to http://localhost:8080
. Each time we reload that HTTP resource, Node.js invokes the callback that starts in line A. It serves a message with the current value of variable requestCount
(line B) and increments it (line C).
Each invocation of the callback is a new task and variable requestCount
is shared between tasks. Due to running to completion, it is easy to read and update. There is no need to synchronize with other concurrently running tasks because there aren’t any.
Why does Node.js code run in a single thread (with an event loop) by default? That has two benefits:
As we have already seen, sharing data between tasks is simpler if there is only a single thread.
In traditional multi-threaded code, an operation that takes longer to complete blocks the current thread until the operation is finished. Examples of such operations are reading a file or processing HTTP requests. Performing many of these operations is expensive because we have to create a new thread each time. With an event loop, the per-operation cost is lower, especially if each operation doesn’t do much. That’s why event-loop-based web servers can handle higher loads than thread-based ones.
Given that some of Node’s asynchronous operations run in threads other than the main thread (more on that soon) and report back to JavaScript via the task queue, Node.js is not really single-threaded. Instead, we use a single thread to coordinate operations that run concurrently and asynchronously (in the main thread).
This concludes our first look at the event loop. Feel free to skip the remainder of this section if a superficial explanation is enough for you. Read on to learn more details.
The real event loop has multiple task queues from which it reads in multiple phases (you can check out some of the JavaScript code in the GitHub repository nodejs/node
). The following diagram shows the most important ones of those phases:
What do the event loop phases do that are shown in the diagram?
Phase “timers” invokes timed tasks that were added to its queue by:
setTimeout(task, delay=1)
runs the callback task
after delay
milliseconds.setInterval(task, delay=1)
runs the callback task
repeatedly, with pauses lasting delay
milliseconds.Phase “poll” retrieves and processes I/O events and runs I/O-related tasks from its queue.
Phase “check” (the “immediate phase”) executes tasks scheduled via:
setImmediate(task)
runs the callback task
as soon as possible (“immediately” after phase “poll”).Each phase runs until its queue is empty or until a maximum number of tasks was processed. Except for “poll”, each phase waits until its next turn before it processes tasks that were added during its run.
setImmediate()
tasks, processing advances to the “check” phase.If this phase takes longer than a system-dependent time limit, it ends and the next phase runs.
After each invoked task, a “sub-loop” runs that consists of two phases:
The sub-phases handle:
process.nextTick()
.queueMicrotask()
, Promise reactions, etc.Next-tick tasks are Node.js-specific, Microtasks are a cross-platform web standard (see MDN’s support table).
This sub-loop runs until both queues are empty. Tasks added during its run, are processed immediately – the sub-loop does not wait until its next turn.
We can use the following functions and methods to add callbacks to one of the task queues:
setTimeout()
(web standard)setInterval()
(web standard)setImmediate()
(Node.js-specific)process.nextTick()
(Node.js-specific)queueMicrotask()
: (web standard)It’s important to note that when timing a task via a delay, we are specifying the earliest possible time that the task will run. Node.js cannot always run them at exactly the scheduled time because it can only check between tasks if any timed tasks are due. Therefore, a long-running task can cause timed tasks to be late.
Consider the following code:
function enqueueTasks() {
Promise.resolve().then(() => console.log('Promise reaction 1'));
queueMicrotask(() => console.log('queueMicrotask 1'));
process.nextTick(() => console.log('nextTick 1'));
setImmediate(() => console.log('setImmediate 1')); // (A)
setTimeout(() => console.log('setTimeout 1'), 0);
Promise.resolve().then(() => console.log('Promise reaction 2'));
queueMicrotask(() => console.log('queueMicrotask 2'));
process.nextTick(() => console.log('nextTick 2'));
setImmediate(() => console.log('setImmediate 2')); // (B)
setTimeout(() => console.log('setTimeout 2'), 0);
}
setImmediate(enqueueTasks);
We use setImmediate()
to avoid a pecularity of ESM modules: They are executed in microtasks, which means that if we enqueue microtasks at the top level of an ESM module, they run before next-tick tasks. As we’ll see next, that’s different in most other contexts.
This is the output of the previous code:
nextTick 1
nextTick 2
Promise reaction 1
queueMicrotask 1
Promise reaction 2
queueMicrotask 2
setTimeout 1
setTimeout 2
setImmediate 1
setImmediate 2
Observations:
All next-tick tasks are executed immediately after enqueueTasks()
.
They are followed by all microtasks, including Promise reactions.
Phase “timers” comes after the immediate phase. That’s when the timed tasks are executed.
We have added immediate tasks during the immediate (“check”) phase (line A and line B). They show up last in the output, which means that they were not executed during the current phase, but during the next immediate phase.
The next code examines what happens if we enqueue a next-tick task during the next-tick phase and a microtask during the microtask phase:
setImmediate(() => {
setImmediate(() => console.log('setImmediate 1'));
setTimeout(() => console.log('setTimeout 1'), 0);
process.nextTick(() => {
console.log('nextTick 1');
process.nextTick(() => console.log('nextTick 2'));
;
})
queueMicrotask(() => {
console.log('queueMicrotask 1');
queueMicrotask(() => console.log('queueMicrotask 2'));
process.nextTick(() => console.log('nextTick 3'));
;
}); })
This is the output:
nextTick 1
nextTick 2
queueMicrotask 1
queueMicrotask 2
nextTick 3
setTimeout 1
setImmediate 1
Observations:
Next-tick tasks are executed first.
“nextTick 2” in enqueued during the next-tick phase and immediately executed. Execution only continues once the next-tick queue is empty.
The same is true for microtasks.
We enqueue “nextTick 3” during the microtask phase and execution loops back to the next-tick phase. These subphases are repeated until both their queues are empty. Only then does execution move on to the next global phases: First the “timers” phase (“setTimeout 1”). Then the immediate phase (“setImmediate 1”).
The following code explores which kinds of tasks can starve out event loop phases (prevent them from running via infinite recursion):
import * as fs from 'node:fs/promises';
function timers() { // OK
setTimeout(() => timers(), 0);
}function immediate() { // OK
setImmediate(() => immediate());
}
function nextTick() { // starves I/O
process.nextTick(() => nextTick());
}
function microtasks() { // starves I/O
queueMicrotask(() => microtasks());
}
timers();
console.log('AFTER'); // always logged
console.log(await fs.readFile('./file.txt', 'utf-8'));
The “timers” phase and the immediate phase don’t execute tasks that are enqueued during their phases. That’s why timers()
and immediate()
don’t starve out fs.readFile()
which reports back during the “poll” phase (there is also a Promise reaction, but let’s ignore that here).
Due to how next-tick tasks and microtasks are scheduled, both nextTick()
and microtasks()
prevent the output in the last line.
At the end of each iteration of the event loop, Node.js checks if it’s time to exit. It keeps a reference count of pending timeouts (for timed tasks):
setImmediate()
, setInterval()
, or setTimeout()
increases the reference count.If the reference count is zero at the end of an event loop iteration, Node.js exits.
We can see that in the following example:
function timeout(ms) {
return new Promise(
, _reject) => {
(resolvesetTimeout(resolve, ms); // (A)
};
)
}await timeout(3_000);
Node.js waits until the Promise returned by timeout()
is fulfilled. Why? Because the task we schedule in line A keeps the event loop alive.
In contrast, creating Promises does not increase the reference count:
function foreverPending() {
return new Promise(
, _reject) => {}
(_resolve;
)
}await foreverPending(); // (A)
In this case, execution temporarily leaves this (main) task during await
in line A. At the end of the event loop, the reference count is zero and Node.js exits. However, the exit is not successful. That is, the exit code is not 0, it is 13 (“Unfinished Top-Level Await”).
We can manually control whether a timeout keeps the event loop alive: By default, tasks scheduled via setImmediate()
, setInterval()
, and setTimeout()
keep the event loop alive as long as they are pending. These functions return instances of class Timeout
whose method .unref()
changes that default so that the timeout being active won’t prevent Node.js from exiting. Method .ref()
restores the default.
Tim Perry mentions a use case for .unref()
: His library used setInterval()
to repeatedly run a background task. That task prevented applications from exiting. He fixed the issue via .unref()
.
libuv is a library written in C that supports many platforms (Windows, macOS, Linux, etc.). Node.js uses it to handle I/O and more.
Network I/O is asynchronous and doesn’t block the current thread. Such I/O includes:
To handle asynchronous I/O, libuv uses native kernel APIs and subscribes to I/O events (epoll on Linux; kqueue on BSD Unix incl. macOS; event ports on SunOS; IOCP on Windows). It then gets notifications when they occur. All of these activities, including the I/O itself, happen on the main thread.
Some native I/O APIs are blocking (not asynchronous) – for example, file I/O and some DNS services. libuv invokes these APIs from threads in a thread pool (the so-called “worker pool”). That enables the main thread to use these APIs asynchronously.
libuv helps Node.js with more than just with I/O. Other functionality includes:
As an aside, libuv has its own event loop whose source code you can check out in the GitHub repository libuv/libuv
(function uv_run()
).
If we want to keep Node.js responsive to I/O, we should avoid performing long-running computations in main-thread tasks. There are two options for doing so:
setImmediate()
. That enables the event loop to perform I/O between the pieces.
The next subsections cover a few options for offloading.
Worker Threads implement the cross-platform Web Workers API with a few differences – e.g.:
Worker Threads have to be imported from a module, Web Workers are accessed via a global variable.
Inside a worker, listening to messages and posting messages is done via methods of the global object in browsers. On Node.js, we import parentPort
instead.
We can use most Node.js APIs from workers. In browsers, our choice is more limited (we can’t use the DOM, etc.).
On Node.js, more objects are transferable (all objects whose classes extend the internal class JSTransferable
) than in browsers.
On one hand, Worker Threads really are threads: They are more lightweight than processes and run in the same process as the main thread.
On the other hand:
Atomics
offers atomic operations and synchronization primitives that help when using SharedArrayBuffers.For more information, see the Node.js documentation on worker threads.
Cluster is a Node.js-specific API. It lets us run clusters of Node.js processes that we can use to distribute workloads. The processes are fully isolated but share server ports. They can communicate by passing JSON data over channels.
If we don’t need process isolation, we can use Worker Threads which are more lightweight.
Child process is another Node.js-specific API. It lets us spawn new processes that run native commands (often via native shells). This API is covered in §12 “Running shell commands in child processes”.
Node.js event loop:
process.nextTick()
”Videos on the event loop (which refresh some of the background knowledge needed for this chapter):
libuv:
JavaScript concurrency: