HomepageExploring JavaScript (ES2025 Edition)
You can support this book: buy it or donate
(Ad, please don’t block.)

22 Strings

22.1 Cheat sheet: strings

Strings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.

22.1.1 Working with strings

Literals for strings:

const str1 = 'Don\'t say "goodbye"'; // string literal
const str2 = "Don't say \"goodbye\""; // string literals
assert.equal(
  `As easy as ${123}!`, // template literal
  'As easy as 123!',
);

Backslashes are used to:

Inside a String.raw tagged template (line A), backslashes are treated as normal characters:

assert.equal(
  String.raw`\ \n\t`, // (A)
  '\\ \\n\\t',
);

Convertings values to strings:

> String(undefined)
'undefined'
> String(null)
'null'
> String(123.45)
'123.45'
> String(true)
'true'

Copying parts of a string

// There is no type for characters;
// reading characters produces strings:
const str3 = 'abc';
assert.equal(
  str3[2], 'c' // no negative indices allowed
);
assert.equal(
  str3.at(-1), 'c' // negative indices allowed
);

// Copying more than one character:
assert.equal(
  'abc'.slice(0, 2), 'ab'
);

Concatenating strings:

assert.equal(
  'I bought ' + 3 + ' apples',
  'I bought 3 apples',
);

let str = '';
str += 'I bought ';
str += 3;
str += ' apples';
assert.equal(
  str, 'I bought 3 apples',
);

22.1.2 JavaScript characters vs. code points vs. grapheme clusters

Example – a grapheme cluster that consists of multiple code points:

const graphemeCluster = '😵‍💫';
assert.equal(
  // 5 JavaScript characters
  '😵‍💫'.length, 5
);
assert.deepEqual(
  // Iteration splits into code points
  Array.from(graphemeCluster),
  ['😵', '\u200D', '💫']
);

For more information on how to handle text, see “Atoms of text: code points, JavaScript characters, grapheme clusters” (§22.7).

22.1.3 String methods

This subsection gives a brief overview of the string API. There is a more comprehensive quick reference at the end of this chapter.

Finding substrings:

> 'abca'.includes('a')
true
> 'abca'.startsWith('ab')
true
> 'abca'.endsWith('ca')
true

> 'abca'.indexOf('a')
0
> 'abca'.lastIndexOf('a')
3

Splitting and joining:

assert.deepEqual(
  'a, b,c'.split(/, ?/),
  ['a', 'b', 'c']
);
assert.equal(
  ['a', 'b', 'c'].join(', '),
  'a, b, c'
);

Padding and trimming:

> '7'.padStart(3, '0')
'007'
> 'yes'.padEnd(6, '!')
'yes!!!'

> '\t abc\n '.trim()
'abc'
> '\t abc\n '.trimStart()
'abc\n '
> '\t abc\n '.trimEnd()
'\t abc'

Repeating and changing case:

> '*'.repeat(5)
'*****'
> '= b2b ='.toUpperCase()
'= B2B ='
> 'ΑΒΓ'.toLowerCase()
'αβγ'

22.2 Plain string literals

Plain string literals are delimited by either single quotes or double quotes:

const str1 = 'abc';
const str2 = "abc";
assert.equal(str1, str2);

Single quotes are used more often because it makes it easier to mention HTML, where double quotes are preferred.

The next chapter covers template literals, which give us:

22.2.1 Escaping

The backslash lets us create special characters:

The backslash also lets us use the delimiter of a string literal inside that literal:

assert.equal(
  'She said: "Let\'s go!"',
  "She said: \"Let's go!\"");

22.3 Accessing JavaScript characters

JavaScript has no extra data type for characters – characters are always represented as strings.

const str = 'abc';

// Reading a JavaScript character at a given index
assert.equal(str[1], 'b');

// Counting the JavaScript characters in a string:
assert.equal(str.length, 3);

The characters we see on screen are called grapheme clusters. Most of them are represented by single JavaScript characters. However, there are also grapheme clusters (especially emojis) that are represented by multiple JavaScript characters:

> '🙂'.length
2

How that works is explained in “Atoms of text: code points, JavaScript characters, grapheme clusters” (§22.7).

22.4 String concatenation

22.4.1 String concatenation via +

If at least one operand is a string, the plus operator (+) converts any non-strings to strings and concatenates the result:

assert.equal(3 + ' times ' + 4, '3 times 4');

The assignment operator += is useful if we want to assemble a string, piece by piece:

let str = ''; // must be `let`!
str += 'Say it';
str += ' one more';
str += ' time';

assert.equal(str, 'Say it one more time');

Icon “details”Concatenating via + is efficient

Using + to assemble strings is quite efficient because most JavaScript engines internally optimize it.

Icon “exercise”Exercise: Concatenating strings

exercises/strings/concat_string_array_test.mjs

22.4.2 Concatenating via Arrays (.push() and .join())

Occasionally, taking a detour via an Array can be useful for concatenating strings – especially if there is to be a separator between them (such as ', ' in line A):

function getPackingList(isAbroad = false, days = 1) {
  const items = [];
  items.push('tooth brush');
  if (isAbroad) {
    items.push('passport');
  }
  if (days > 3) {
    items.push('water bottle');
  }
  return items.join(', '); // (A)
}
assert.equal(
  getPackingList(),
  'tooth brush'
);
assert.equal(
  getPackingList(true, 7),
  'tooth brush, passport, water bottle'
);

22.5 Converting values to strings in JavaScript has pitfalls

Converting values to strings in JavaScript is more complicated than it might seem:

22.5.1 Example: code that is problematic

Can you spot the problem in the following code?

class UnexpectedValueError extends Error {
  constructor(value) {
    super('Unexpected value: ' + value); // (A)
  }
}

For some values, this code throws an exception in line A:

> new UnexpectedValueError(Symbol())
TypeError: Cannot convert a Symbol value to a string
> new UnexpectedValueError({__proto__:null})
TypeError: Cannot convert object to primitive value

Read on for more information.

22.5.2 Four common ways of converting values to strings

  1. String(v)
  2. v.toString()
  3. '' + v
  4. `${v}`

The following table shows how these operations fare with various values (#4 produces the same results as #3).

String(v)'' + vv.toString()
undefined'undefined''undefined'TypeError
null'null''null'TypeError
true'true''true''true'
123'123''123''123'
123n'123''123''123'
"abc"'abc''abc''abc'
Symbol()'Symbol()'TypeError'Symbol()'
{a:1}'[object Object]''[object Object]''[object Object]'
['a']'a''a''a'
{__proto__:null}TypeErrorTypeErrorTypeError
Symbol.prototypeTypeErrorTypeErrorTypeError
() => {}'() => {}''() => {}''() => {}'

Let’s explore why some of these values produce exceptions or results that aren’t very useful.

22.5.2.1 Tricky values: symbols

Symbols must be converted to strings explicitly (via String() or .toString()). Conversion via concatenation throws an exception:

> '' + Symbol()
TypeError: Cannot convert a Symbol value to a string

Why is that? The intent is to prevent accidentally converting a symbol property key to a string (which is also a valid property key).

22.5.2.2 Tricky values: objects with null prototypes

It’s obvious why v.toString() doesn’t work if there is no method .toString(). However, the other conversion operations call the following methods in the following order and use the first primitive value that is returned (after converting it to string):

If none of these methods are present, a TypeError is thrown.

> String({__proto__: null, [Symbol.toPrimitive]() {return 'YES'}})
'YES'
> String({__proto__: null, toString() {return 'YES'}})
'YES'
> String({__proto__: null, valueOf() {return 'YES'}})
'YES'

> String({__proto__: null}) // no method available
TypeError: Cannot convert object to primitive value

Where might we encounter objects with null prototypes?

22.5.2.3 Tricky values: objects in general

Plain objects have default string representations that are not very useful:

> String({a: 1})
'[object Object]'

Arrays have better string representations, but they still hide much information:

> String(['a', 'b'])
'a,b'
> String(['a', ['b']])
'a,b'

> String([1, 2])
'1,2'
> String(['1', '2'])
'1,2'

> String([true])
'true'
> String(['true'])
'true'
> String(true)
'true'
22.5.2.4 Tricky value: Symbol.prototype

You’ll probably never encounter the value Symbol.prototype (the object that provides symbols with methods) in the wild but it’s an interesting edge case: Symbol.prototype[Symbol.toPrimitive]() throws an exception if this isn’t a symbol. That explains why converting Symbol.prototype to a string doesn’t work:

> Symbol.prototype[Symbol.toPrimitive]()
TypeError: Symbol.prototype [ @@toPrimitive ] requires that 'this' be a Symbol
> String(Symbol.prototype)
TypeError: Symbol.prototype [ @@toPrimitive ] requires that 'this' be a Symbol

22.5.3 Using JSON.stringify() to convert values to strings

The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify() can also be used to convert values to strings. It works especially well for objects and Arrays where the normal conversion to string has significant deficiencies:

> JSON.stringify({a: 1})
'{"a":1}'
> JSON.stringify(['a', ['b']])
'["a",["b"]]'

JSON.stringify() is OK with objects whose prototypes are null:

> JSON.stringify({__proto__: null, a: 1})
'{"a":1}'

On major downside is that JSON.stringify() only supports the following values:

For most other values, we get undefined as a result (and not a string):

> JSON.stringify(undefined)
undefined
> JSON.stringify(Symbol())
undefined
> JSON.stringify(() => {})
undefined

Bigints cause exceptions:

> JSON.stringify(123n)
TypeError: Do not know how to serialize a BigInt

Properties with undefined-producing values are omitted:

> JSON.stringify({a: Symbol(), b: 2})
'{"b":2}'

Array elements whose values produce undefined, are stringified as null:

> JSON.stringify(['a', Symbol(), 'b'])
'["a",null,"b"]'

The following table summarizes the results of JSON.stringify(v):

JSON.stringify(v)
undefinedundefined
null'null'
true'true'
123'123'
123nTypeError
'abc''"abc"'
Symbol()undefined
{a:1}'{"a":1}'
['a']'["a"]'
() => {}undefined
{__proto__:null}'{}'
Symbol.prototype'{}'

For more information, see “Details on how data is converted to JSON” (§48.3.1.3).

22.5.3.1 Multiline output

By default, JSON.stringify() returns a single line of text. However the optional third parameter enables multiline output and lets us specify how much to indent – for example:

assert.equal(
JSON.stringify({first: 'Robin', last: 'Doe'}, null, 2),
`{
  "first": "Robin",
  "last": "Doe"
}`
);
22.5.3.2 Displaying strings via JSON.stringify()

JSON.stringify() is useful for displaying arbitrary strings:

Example:

const strWithNewlinesAndTabs = `
TAB->	<-TAB
Second line 
`;
console.log(JSON.stringify(strWithNewlinesAndTabs));

Output:

"\nTAB->\t<-TAB\nSecond line \n"

22.5.4 Solutions

Alas, there are no good built-in solutions for stringification that work all the time. In this section, we’ll explore a short function that works for all simple use cases, along with solutions for more sophisticated use cases.

22.5.4.1 Short solution: a custom toString() function

What would a simple solution for stringification look like?

JSON.stringify() works well for a lot of data, especially plain objects and Arrays. If it can’t stringify a given value, it returns undefined instead of a string – unless the value is a bigint. Then it throws an exception.

Therefore, we can use the following function for stringification:

function toString(v) {
  if (typeof v === 'bigint') {
    return v + 'n';
  }
  return JSON.stringify(v) ?? String(v); // (A)
}

For values that are not supported by JSON.stringify(), we use String() as a fallback (line A). That function only throws for the following two values – which are both handled well by JSON.stringify():

The following table summarizes the results of toString():

toString()
undefined'undefined'
null'null'
true'true'
123'123'
123n'123n'
'abc''"abc"'
Symbol()'Symbol()'
{a:1}'{"a":1}'
['a']'["a"]'
() => {}'() => {}'
{__proto__:null}'{}'
Symbol.prototype'{}'
22.5.4.2 Library for stringifying values
22.5.4.3 Node.js functions for stringifying values

Node.js has several built-in functions that provide sophisticated support for converting JavaScript values to strings – e.g.:

These functions can even handle circular data:

import * as util from 'node:util';

const cycle = {};
cycle.prop = cycle;
assert.equal(
  util.inspect(cycle),
  '<ref *1> { prop: [Circular *1] }'
);
22.5.4.4 Alternative to stringification: logging data to the console

Console methods such as console.log() tend to produce good output and have few limitations:

console.log({__proto__: null, prop: Symbol()});

Output:

[Object: null prototype] { prop: Symbol() }

However, by default, they only display objects up to a certain depth:

console.log({a: {b: {c: {d: true}}}});

Output:

{ a: { b: { c: [Object] } } }

Node.js lets us specify the depth for console.dir() – with null meaning infinite:

console.dir({a: {b: {c: {d: true}}}}, {depth: null});

Output:

{
  a: { b: { c: { d: true } } }
}

In browsers, console.dir() does not have an options object but lets us interactively and incrementally descend into objects.

22.5.5 Customizing how objects are converted to strings

22.5.5.1 Customizing the string conversion of objects

We can customize the built-in way of stringifying objects by implementing the method .toString():

const helloObj = {
  toString() {
    return 'Hello!';
  }
};
assert.equal(
  String(helloObj), 'Hello!'
);
22.5.5.2 Customizing the conversion to JSON

We can customize how an object is converted to JSON by implementing the method .toJSON():

const point = {
  x: 1,
  y: 2,
  toJSON() {
    return [this.x, this.y];
  }
}
assert.equal(
  JSON.stringify(point), '[1,2]'
);

22.6 Comparing strings

Strings can be compared via the following operators:

< <= > >=

There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:

> 'A' < 'B' // ok
true
> 'a' < 'B' // not ok
false
> 'ä' < 'b' // not ok
false

Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl).

22.7 Atoms of text: code points, JavaScript characters, grapheme clusters

Quick recap of “Unicode – a brief introduction (advanced)” (§21):

The following code demonstrates that a single code point comprises one or two JavaScript characters. We count the latter via .length:

// 3 code points, 3 JavaScript characters:
assert.equal('abc'.length, 3);

// 1 code point, 2 JavaScript characters:
assert.equal('🙂'.length, 2);

The following table summarizes the concepts we have just explored:

EntitySizeEncoded via
JavaScript character (UTF-16 code unit)16 bits
Unicode code point21 bits1–2 code units
Unicode grapheme cluster1+ code points

22.7.1 Working with code units (JavaScript characters)

Indices and lengths of strings are based on JavaScript characters – which are UTF-16 code units.

22.7.1.1 Accessing code units

Code units are accessed like Array elements:

> const str = 'αβγ';
> str.length
3
> str[0]
'α'

str.split('') splits into code units:

> str.split('')
[ 'α', 'β', 'γ' ]
> 'A🙂'.split('')
[ 'A', '\uD83D', '\uDE42' ]

The emoji 🙂 consists of two code units.

22.7.1.2 Escaping code units

To specify a code unit hexadecimally, we can use a Unicode code unit escape with exactly four hexadecimal digits:

> '\u03B1\u03B2\u03B3'
'αβγ'

ASCII escapes: If the code point of a character is below 256, we can refer to it via an ASCII escape with exactly two hexadecimal digits:

> 'He\x6C\x6Co'
'Hello'

Icon “details”Official name of an ASCII escape: hexadecimal escape sequence

It was the first escape that used hexadecimal numbers.

22.7.1.3 Converting code units to numbers (char codes)

To get the char code of a character, we can use .charCodeAt():

> 'α'.charCodeAt(0).toString(16)
'3b1'

String.fromCharCode() converts a char code to a string:

> String.fromCharCode(0x3B1)
'α'

22.7.2 Working with code points

22.7.2.1 Accessing code points

Iteration (which is described later in this book) splits strings into code points:

const codePoints = 'A🙂';
for (const codePoint of codePoints) {
  console.log(codePoint + ' ' + codePoint.length);
}

Output:

A 1
🙂 2

Array.from() uses iteration:

> Array.from('A🙂')
[ 'A', '🙂' ]

Therefore, this is how we can count the number of code points in a string:

> Array.from('A🙂').length
2
> 'A🙂'.length
3
22.7.2.2 Escaping code points

A Unicode code point escape lets us specify a code point hexadecimally (1–5 digits). It produces one or two JavaScript characters.

> '\u{1F642}'
'🙂'
22.7.2.3 Converting code points in strings to numbers

.codePointAt() returns the code point number for a sequence of 1–2 JavaScript characters:

> '🙂'.codePointAt(0).toString(16)
'1f642'

String.fromCodePoint() converts a code point number to 1–2 JavaScript characters:

> String.fromCodePoint(0x1F642)
'🙂'
22.7.2.4 Regular expressions for code points

If we use the flag /v for a regular expression, it supports Unicode better and matches code points not code units:

> '🙂'.match(/./g)
[ '\uD83D', '\uDE42' ]
> '🙂'.match(/./gv)
[ '🙂' ]

More information: “Flag /v: limited support for multi-code-point grapheme clusters ES2024” (§46.11.4).

22.7.3 Working with grapheme clusters

22.7.3.1 Accessing grapheme clusters

This is a grapheme cluster that consists of 3 code points:

const graphemeCluster = '😵‍💫';
assert.deepEqual(
  // Iteration splits into code points
  Array.from(graphemeCluster),
  ['😵', '\u200D', '💫']
);
assert.equal(
  // 5 JavaScript characters
  '😵‍💫'.length, 5
);

To split a string into grapheme clusters, we can use Intl.Segmenter – a class that isn’t part of ECMAScript proper, but part of the ECMAScript internationalization API. It is supported by most JavaScript platforms. This is how we can use it:

const segmenter = new Intl.Segmenter('en-US', { granularity: 'grapheme' });
assert.deepEqual(
  Array.from(segmenter.segment('A🙂😵‍💫')),
  [
    { segment: 'A', index: 0, input: 'A🙂😵‍💫' },
    { segment: '🙂', index: 1, input: 'A🙂😵‍💫' },
    { segment: '😵‍💫', index: 3, input: 'A🙂😵‍💫' },
  ]
);

.segmenter() returns an iterable over segment objects. We can use it via for-of, Array.from(), Iterator.from(), etc.

22.7.3.2 Regular expressions for grapheme clusters

The regular expression flag /v provides some limited support for grapheme clusters – e.g., we can match emojis with potentially multiple code points like this:

> 'A🙂😵‍💫'.match(/\p{RGI_Emoji}/gv)
[ '🙂', '😵‍💫' ]

More information: “Flag /v: limited support for multi-code-point grapheme clusters ES2024” (§46.11.4).

22.8 Quick reference: Strings

22.8.1 Converting to string

Table 22.1 describes how various values are converted to strings.

xString(x)
undefined'undefined'
null'null'
booleanfalse'false', true'true'
numberExample: 123'123'
bigintExample: 123n'123'
stringx (input, unchanged)
symbolExample: Symbol('abc')'Symbol(abc)'
objectConfigurable via, e.g., toString()

Table 22.1: Converting values to strings.

22.8.2 Numeric values of text atoms

22.8.3 String.prototype.*: regular expression methods

The following methods are listed in the quick reference for regular expressions:

22.8.4 String.prototype.*: finding and matching

22.8.5 String.prototype.*: extracting

22.8.6 String.prototype.*: combining

22.8.7 String.prototype.*: transforming

22.8.8 Sources of this quick reference

Icon “exercise”Exercise: Using string methods

exercises/strings/remove_extension_test.mjs