JavaScript for impatient programmers
Please support this book: buy it or donate
(Ad, please don’t block.)

18 Strings



Strings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.

18.1 Plain string literals

Plain string literals are delimited by either single quotes or double quotes:

const str1 = 'abc';
const str2 = "abc";
assert.equal(str1, str2);

Single quotes are used more often because it makes it easier to mention HTML, where double quotes are preferred.

The next chapter covers template literals, which give you:

18.1.1 Escaping

The backslash lets you create special characters:

The backslash also lets you use the delimiter of a string literal inside that literal:

assert.equal(
  'She said: "Let\'s go!"',
  "She said: \"Let's go!\"");

18.2 Accessing characters and code points

18.2.1 Accessing JavaScript characters

JavaScript has no extra data type for characters – characters are always represented as strings.

const str = 'abc';

// Reading a character at a given index
assert.equal(str[1], 'b');

// Counting the characters in a string:
assert.equal(str.length, 3);

18.2.2 Accessing Unicode code point characters via for-of and spreading

Iterating over strings via for-of or spreading (...) visits Unicode code point characters. Each code point character is encoded by 1–2 JavaScript characters. For more information, see §18.6 “Atoms of text: Unicode characters, JavaScript characters, grapheme clusters”.

This is how you iterate over the code point characters of a string via for-of:

for (const ch of 'x🙂y') {
  console.log(ch);
}
// Output:
// 'x'
// '🙂'
// 'y'

And this is how you convert a string into an Array of code point characters via spreading:

assert.deepEqual([...'x🙂y'], ['x', '🙂', 'y']);

18.3 String concatenation via +

If at least one operand is a string, the plus operator (+) converts any non-strings to strings and concatenates the result:

assert.equal(3 + ' times ' + 4, '3 times 4');

The assignment operator += is useful if you want to assemble a string, piece by piece:

let str = ''; // must be `let`!
str += 'Say it';
str += ' one more';
str += ' time';

assert.equal(str, 'Say it one more time');

  Concatenating via + is efficient

Using + to assemble strings is quite efficient because most JavaScript engines internally optimize it.

  Exercise: Concatenating strings

exercises/strings/concat_string_array_test.mjs

18.4 Converting to string

These are three ways of converting a value x to a string:

Recommendation: use the descriptive and safe String().

Examples:

assert.equal(String(undefined), 'undefined');
assert.equal(String(null), 'null');

assert.equal(String(false), 'false');
assert.equal(String(true), 'true');

assert.equal(String(123.45), '123.45');

Pitfall for booleans: If you convert a boolean to a string via String(), you generally can’t convert it back via Boolean():

> String(false)
'false'
> Boolean('false')
true

The only string for which Boolean() returns false, is the empty string.

18.4.1 Stringifying objects

Plain objects have a default string representation that is not very useful:

> String({a: 1})
'[object Object]'

Arrays have a better string representation, but it still hides much information:

> String(['a', 'b'])
'a,b'
> String(['a', ['b']])
'a,b'

> String([1, 2])
'1,2'
> String(['1', '2'])
'1,2'

> String([true])
'true'
> String(['true'])
'true'
> String(true)
'true'

Stringifying functions, returns their source code:

> String(function f() {return 4})
'function f() {return 4}'

18.4.2 Customizing the stringification of objects

You can override the built-in way of stringifying objects by implementing the method toString():

const obj = {
  toString() {
    return 'hello';
  }
};

assert.equal(String(obj), 'hello');

18.4.3 An alternate way of stringifying values

The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify() can also be used to convert values to strings:

> JSON.stringify({a: 1})
'{"a":1}'
> JSON.stringify(['a', ['b']])
'["a",["b"]]'

The caveat is that JSON only supports null, booleans, numbers, strings, Arrays, and objects (which it always treats as if they were created by object literals).

Tip: The third parameter lets you switch on multiline output and specify how much to indent – for example:

console.log(JSON.stringify({first: 'Jane', last: 'Doe'}, null, 2));

This statement produces the following output:

{
  "first": "Jane",
  "last": "Doe"
}

18.5 Comparing strings

Strings can be compared via the following operators:

< <= > >=

There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:

> 'A' < 'B' // ok
true
> 'a' < 'B' // not ok
false
> 'ä' < 'b' // not ok
false

Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl).

18.6 Atoms of text: Unicode characters, JavaScript characters, grapheme clusters

Quick recap of §17 “Unicode – a brief introduction”:

The following code demonstrates that a single Unicode character comprises one or two JavaScript characters. We count the latter via .length:

// 3 Unicode characters, 3 JavaScript characters:
assert.equal('abc'.length, 3);

// 1 Unicode character, 2 JavaScript characters:
assert.equal('🙂'.length, 2);

The following table summarizes the concepts we have just explored:

Entity Numeric representation Size Encoded via
Grapheme cluster 1+ code points
Unicode character Code point 21 bits 1–2 code units
JavaScript character UTF-16 code unit 16 bits

18.6.1 Working with code points

Let’s explore JavaScript’s tools for working with code points.

A code point escape lets you specify a code point hexadecimally. It produces one or two JavaScript characters.

> '\u{1F642}'
'🙂'

String.fromCodePoint() converts a single code point to 1–2 JavaScript characters:

> String.fromCodePoint(0x1F642)
'🙂'

.codePointAt() converts 1–2 JavaScript characters to a single code point:

> '🙂'.codePointAt(0).toString(16)
'1f642'

You can iterate over a string, which visits Unicode characters (not JavaScript characters). Iteration is described later in this book. One way of iterating is via a for-of loop:

const str = '🙂a';
assert.equal(str.length, 3);

for (const codePointChar of str) {
  console.log(codePointChar);
}

// Output:
// '🙂'
// 'a'

Spreading (...) into Array literals is also based on iteration and visits Unicode characters:

> [...'🙂a']
[ '🙂', 'a' ]

That makes it a good tool for counting Unicode characters:

> [...'🙂a'].length
2
> '🙂a'.length
3

18.6.2 Working with code units (char codes)

Indices and lengths of strings are based on JavaScript characters (as represented by UTF-16 code units).

To specify a code unit hexadecimally, you can use a code unit escape:

> '\uD83D\uDE42'
'🙂'

And you can use String.fromCharCode(). Char code is the standard library’s name for code unit:

> String.fromCharCode(0xD83D) + String.fromCharCode(0xDE42)
'🙂'

To get the char code of a character, use .charCodeAt():

> '🙂'.charCodeAt(0).toString(16)
'd83d'

18.6.3 Caveat: grapheme clusters

When working with text that may be written in any human language, it’s best to split at the boundaries of grapheme clusters, not at the boundaries of Unicode characters.

TC39 is working on Intl.Segmenter, a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).

Until that proposal becomes a standard, you can use one of several libraries that are available (do a web search for “JavaScript grapheme”).

18.7 Quick reference: Strings

Strings are immutable; none of the string methods ever modify their strings.

18.7.1 Converting to string

Tbl. 13 describes how various values are converted to strings.

Table 13: Converting values to strings.
x String(x)
undefined 'undefined'
null 'null'
Boolean value false 'false', true 'true'
Number value Example: 123 '123'
String value x (input, unchanged)
An object Configurable via, e.g., toString()

18.7.2 Numeric values of characters

18.7.3 String operators

// Access characters via []
const str = 'abc';
assert.equal(str[1], 'b');

// Concatenate strings via +
assert.equal('a' + 'b' + 'c', 'abc');
assert.equal('take ' + 3 + ' oranges', 'take 3 oranges');

18.7.4 String.prototype: finding and matching

(String.prototype is where the methods of strings are stored.)

18.7.5 String.prototype: extracting

18.7.6 String.prototype: combining

18.7.7 String.prototype: transforming

18.7.8 Sources

  Exercise: Using string methods

exercises/strings/remove_extension_test.mjs

  Quiz

See quiz app.