JavaScript for impatient programmers (ES2022 edition)
Please support this book: buy it or donate
(Ad, please don’t block.)

20 Strings



20.1 Cheat sheet: strings

Strings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.

20.1.1 Working with strings

Literals for strings:

const str1 = 'Don\'t say "goodbye"'; // string literal
const str2 = "Don't say \"goodbye\""; // string literals
assert.equal(
  `As easy as ${123}!`, // template literal
  'As easy as 123!',
);

Backslashes are used to:

Inside a String.raw tagged template (line A), backslashes are treated as normal characters:

assert.equal(
  String.raw`\ \n\t`, // (A)
  '\\ \\n\\t',
);

Convertings values to strings:

> String(undefined)
'undefined'
> String(null)
'null'
> String(123.45)
'123.45'
> String(true)
'true'

Copying parts of a string

// There is no type for characters;
// reading characters produces strings:
const str3 = 'abc';
assert.equal(
  str3[2], 'c' // no negative indices allowed
);
assert.equal(
  str3.at(-1), 'c' // negative indices allowed
);

// Copying more than one character:
assert.equal(
  'abc'.slice(0, 2), 'ab'
);

Concatenating strings:

assert.equal(
  'I bought ' + 3 + ' apples',
  'I bought 3 apples',
);

let str = '';
str += 'I bought ';
str += 3;
str += ' apples';
assert.equal(
  str, 'I bought 3 apples',
);

20.1.2 JavaScript characters vs. code points vs. grapheme clusters

JavaScript characters are 16 bits in size. They are what is indexed in strings and what .length counts.

Code points are the atomic parts of Unicode text. Most of them fit into one JavaScript character, some of them occupy two (especially emojis):

assert.equal(
  'A'.length, 1
);
assert.equal(
  '🙂'.length, 2
);

Grapheme clusters (user-perceived characters) represent written symbols. Each one comprises one or more code points.

Due to these facts, we shouldn’t split text into JavaScript characters, we should split it into graphemes. For more information on how to handle text, see §20.7 “Atoms of text: code points, JavaScript characters, grapheme clusters”.

20.1.3 String methods

This subsection gives a brief overview of the string API. There is a more comprehensive quick reference at the end of this chapter.

Finding substrings:

> 'abca'.includes('a')
true
> 'abca'.startsWith('ab')
true
> 'abca'.endsWith('ca')
true

> 'abca'.indexOf('a')
0
> 'abca'.lastIndexOf('a')
3

Splitting and joining:

assert.deepEqual(
  'a, b,c'.split(/, ?/),
  ['a', 'b', 'c']
);
assert.equal(
  ['a', 'b', 'c'].join(', '),
  'a, b, c'
);

Padding and trimming:

> '7'.padStart(3, '0')
'007'
> 'yes'.padEnd(6, '!')
'yes!!!'

> '\t abc\n '.trim()
'abc'
> '\t abc\n '.trimStart()
'abc\n '
> '\t abc\n '.trimEnd()
'\t abc'

Repeating and changing case:

> '*'.repeat(5)
'*****'
> '= b2b ='.toUpperCase()
'= B2B ='
> 'ΑΒΓ'.toLowerCase()
'αβγ'

20.2 Plain string literals

Plain string literals are delimited by either single quotes or double quotes:

const str1 = 'abc';
const str2 = "abc";
assert.equal(str1, str2);

Single quotes are used more often because it makes it easier to mention HTML, where double quotes are preferred.

The next chapter covers template literals, which give us:

20.2.1 Escaping

The backslash lets us create special characters:

The backslash also lets us use the delimiter of a string literal inside that literal:

assert.equal(
  'She said: "Let\'s go!"',
  "She said: \"Let's go!\"");

20.3 Accessing JavaScript characters

JavaScript has no extra data type for characters – characters are always represented as strings.

const str = 'abc';

// Reading a JavaScript character at a given index
assert.equal(str[1], 'b');

// Counting the JavaScript characters in a string:
assert.equal(str.length, 3);

The characters we see on screen are called grapheme clusters. Most of them are represented by single JavaScript characters. However, there are also grapheme clusters (especially emojis) that are represented by multiple JavaScript characters:

> '🙂'.length
2

How that works is explained in §20.7 “Atoms of text: code points, JavaScript characters, grapheme clusters”.

20.4 String concatenation via +

If at least one operand is a string, the plus operator (+) converts any non-strings to strings and concatenates the result:

assert.equal(3 + ' times ' + 4, '3 times 4');

The assignment operator += is useful if we want to assemble a string, piece by piece:

let str = ''; // must be `let`!
str += 'Say it';
str += ' one more';
str += ' time';

assert.equal(str, 'Say it one more time');

  Concatenating via + is efficient

Using + to assemble strings is quite efficient because most JavaScript engines internally optimize it.

  Exercise: Concatenating strings

exercises/strings/concat_string_array_test.mjs

20.5 Converting to string

These are three ways of converting a value x to a string:

Recommendation: use the descriptive and safe String().

Examples:

assert.equal(String(undefined), 'undefined');
assert.equal(String(null), 'null');

assert.equal(String(false), 'false');
assert.equal(String(true), 'true');

assert.equal(String(123.45), '123.45');

Pitfall for booleans: If we convert a boolean to a string via String(), we generally can’t convert it back via Boolean():

> String(false)
'false'
> Boolean('false')
true

The only string for which Boolean() returns false, is the empty string.

20.5.1 Stringifying objects

Plain objects have a default string representation that is not very useful:

> String({a: 1})
'[object Object]'

Arrays have a better string representation, but it still hides much information:

> String(['a', 'b'])
'a,b'
> String(['a', ['b']])
'a,b'

> String([1, 2])
'1,2'
> String(['1', '2'])
'1,2'

> String([true])
'true'
> String(['true'])
'true'
> String(true)
'true'

Stringifying functions, returns their source code:

> String(function f() {return 4})
'function f() {return 4}'

20.5.2 Customizing the stringification of objects

We can override the built-in way of stringifying objects by implementing the method toString():

const obj = {
  toString() {
    return 'hello';
  }
};

assert.equal(String(obj), 'hello');

20.5.3 An alternate way of stringifying values

The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify() can also be used to convert values to strings:

> JSON.stringify({a: 1})
'{"a":1}'
> JSON.stringify(['a', ['b']])
'["a",["b"]]'

The caveat is that JSON only supports null, booleans, numbers, strings, Arrays, and objects (which it always treats as if they were created by object literals).

Tip: The third parameter lets us switch on multiline output and specify how much to indent – for example:

console.log(JSON.stringify({first: 'Jane', last: 'Doe'}, null, 2));

This statement produces the following output:

{
  "first": "Jane",
  "last": "Doe"
}

20.6 Comparing strings

Strings can be compared via the following operators:

< <= > >=

There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:

> 'A' < 'B' // ok
true
> 'a' < 'B' // not ok
false
> 'ä' < 'b' // not ok
false

Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl).

20.7 Atoms of text: code points, JavaScript characters, grapheme clusters

Quick recap of §19 “Unicode – a brief introduction”:

The following code demonstrates that a single code point comprises one or two JavaScript characters. We count the latter via .length:

// 3 code points, 3 JavaScript characters:
assert.equal('abc'.length, 3);

// 1 code point, 2 JavaScript characters:
assert.equal('🙂'.length, 2);

The following table summarizes the concepts we have just explored:

Entity Size Encoded via
JavaScript character (UTF-16 code unit) 16 bits
Unicode code point 21 bits 1–2 code units
Unicode grapheme cluster 1+ code points

20.7.1 Working with code points

Let’s explore JavaScript’s tools for working with code points.

A Unicode code point escape lets us specify a code point hexadecimally (1–5 digits). It produces one or two JavaScript characters.

> '\u{1F642}'
'🙂'

  Unicode escape sequences

In the ECMAScript language specification, Unicode code point escapes and Unicode code unit escapes (which we’ll encounter later) are called Unicode escape sequences.

String.fromCodePoint() converts a single code point to 1–2 JavaScript characters:

> String.fromCodePoint(0x1F642)
'🙂'

.codePointAt() converts 1–2 JavaScript characters to a single code point:

> '🙂'.codePointAt(0).toString(16)
'1f642'

We can iterate over a string, which visits code points (not JavaScript characters). Iteration is described later in this book. One way of iterating is via a for-of loop:

const str = '🙂a';
assert.equal(str.length, 3);

for (const codePointChar of str) {
  console.log(codePointChar);
}

// Output:
// '🙂'
// 'a'

Array.from() is also based on iteration and visits code points:

> Array.from('🙂a')
[ '🙂', 'a' ]

That makes it a good tool for counting code points:

> Array.from('🙂a').length
2
> '🙂a'.length
3

20.7.2 Working with code units (char codes)

Indices and lengths of strings are based on JavaScript characters (as represented by UTF-16 code units).

To specify a code unit hexadecimally, we can use a Unicode code unit escape with exactly four hexadecimal digits:

> '\uD83D\uDE42'
'🙂'

And we can use String.fromCharCode(). Char code is the standard library’s name for code unit:

> String.fromCharCode(0xD83D) + String.fromCharCode(0xDE42)
'🙂'

To get the char code of a character, use .charCodeAt():

> '🙂'.charCodeAt(0).toString(16)
'd83d'

20.7.3 ASCII escapes

If the code point of a character is below 256, we can refer to it via a ASCII escape with exactly two hexadecimal digits:

> 'He\x6C\x6Co'
'Hello'

(The official name of ASCII escapes is Hexadecimal escape sequences – it was the first escape that used hexadecimal numbers.)

20.7.4 Caveat: grapheme clusters

When working with text that may be written in any human language, it’s best to split at the boundaries of grapheme clusters, not at the boundaries of code points.

TC39 is working on Intl.Segmenter, a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).

Until that proposal becomes a standard, we can use one of several libraries that are available (do a web search for “JavaScript grapheme”).

20.8 Quick reference: Strings

20.8.1 Converting to string

Tbl. 14 describes how various values are converted to strings.

Table 14: Converting values to strings.
x String(x)
undefined 'undefined'
null 'null'
boolean false 'false', true 'true'
number Example: 123 '123'
bigint Example: 123n '123'
string x (input, unchanged)
symbol Example: Symbol('abc') 'Symbol(abc)'
object Configurable via, e.g., toString()

20.8.2 Numeric values of text atoms

20.8.3 String.prototype: finding and matching

(String.prototype is where the methods of strings are stored.)

20.8.4 String.prototype: extracting

20.8.5 String.prototype: combining

20.8.6 String.prototype: transforming

20.8.7 Sources

  Exercise: Using string methods

exercises/strings/remove_extension_test.mjs

  Quiz

See quiz app.