+
String.prototype
: finding and matchingString.prototype
: extractingString.prototype
: combiningString.prototype
: transformingStrings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.
Literals for strings:
const str1 = 'Don\'t say "goodbye"'; // string literal
const str2 = "Don't say \"goodbye\""; // string literals
.equal(
assert`As easy as ${123}!`, // template literal
'As easy as 123!',
; )
Backslashes are used to:
\\
represents a backslash\n
represents a newline\r
represents a carriage return\t
represents a tabInside a String.raw
tagged template (line A), backslashes are treated as normal characters:
.equal(
assertString.raw`\ \n\t`, // (A)
'\\ \\n\\t',
; )
Convertings values to strings:
> String(undefined)'undefined'
> String(null)'null'
> String(123.45)'123.45'
> String(true)'true'
Copying parts of a string
// There is no type for characters;
// reading characters produces strings:
const str3 = 'abc';
.equal(
assert2], 'c' // no negative indices allowed
str3[;
).equal(
assert.at(-1), 'c' // negative indices allowed
str3;
)
// Copying more than one character:
.equal(
assert'abc'.slice(0, 2), 'ab'
; )
Concatenating strings:
.equal(
assert'I bought ' + 3 + ' apples',
'I bought 3 apples',
;
)
let str = '';
+= 'I bought ';
str += 3;
str += ' apples';
str .equal(
assert, 'I bought 3 apples',
str; )
JavaScript characters are 16 bits in size. They are what is indexed in strings and what .length
counts.
Code points are the atomic parts of Unicode text. Most of them fit into one JavaScript character, some of them occupy two (especially emojis):
.equal(
assert'A'.length, 1
;
).equal(
assert'🙂'.length, 2
; )
Grapheme clusters (user-perceived characters) represent written symbols. Each one comprises one or more code points.
Due to these facts, we shouldn’t split text into JavaScript characters, we should split it into graphemes. For more information on how to handle text, see §20.7 “Atoms of text: code points, JavaScript characters, grapheme clusters”.
This subsection gives a brief overview of the string API. There is a more comprehensive quick reference at the end of this chapter.
Finding substrings:
> 'abca'.includes('a')true
> 'abca'.startsWith('ab')true
> 'abca'.endsWith('ca')true
> 'abca'.indexOf('a')0
> 'abca'.lastIndexOf('a')3
Splitting and joining:
.deepEqual(
assert'a, b,c'.split(/, ?/),
'a', 'b', 'c']
[;
).equal(
assert'a', 'b', 'c'].join(', '),
['a, b, c'
; )
Padding and trimming:
> '7'.padStart(3, '0')'007'
> 'yes'.padEnd(6, '!')'yes!!!'
> '\t abc\n '.trim()'abc'
> '\t abc\n '.trimStart()'abc\n '
> '\t abc\n '.trimEnd()'\t abc'
Repeating and changing case:
> '*'.repeat(5)'*****'
> '= b2b ='.toUpperCase()'= B2B ='
> 'ΑΒΓ'.toLowerCase()'αβγ'
Plain string literals are delimited by either single quotes or double quotes:
const str1 = 'abc';
const str2 = "abc";
.equal(str1, str2); assert
Single quotes are used more often because it makes it easier to mention HTML, where double quotes are preferred.
The next chapter covers template literals, which give us:
The backslash lets us create special characters:
'\n'
'\r\n'
'\t'
'\\'
The backslash also lets us use the delimiter of a string literal inside that literal:
.equal(
assert'She said: "Let\'s go!"',
"She said: \"Let's go!\"");
JavaScript has no extra data type for characters – characters are always represented as strings.
const str = 'abc';
// Reading a JavaScript character at a given index
.equal(str[1], 'b');
assert
// Counting the JavaScript characters in a string:
.equal(str.length, 3); assert
The characters we see on screen are called grapheme clusters. Most of them are represented by single JavaScript characters. However, there are also grapheme clusters (especially emojis) that are represented by multiple JavaScript characters:
> '🙂'.length2
How that works is explained in §20.7 “Atoms of text: code points, JavaScript characters, grapheme clusters”.
+
If at least one operand is a string, the plus operator (+
) converts any non-strings to strings and concatenates the result:
.equal(3 + ' times ' + 4, '3 times 4'); assert
The assignment operator +=
is useful if we want to assemble a string, piece by piece:
let str = ''; // must be `let`!
+= 'Say it';
str += ' one more';
str += ' time';
str
.equal(str, 'Say it one more time'); assert
Concatenating via +
is efficient
Using +
to assemble strings is quite efficient because most JavaScript engines internally optimize it.
Exercise: Concatenating strings
exercises/strings/concat_string_array_test.mjs
These are three ways of converting a value x
to a string:
String(x)
''+x
x.toString()
(does not work for undefined
and null
)Recommendation: use the descriptive and safe String()
.
Examples:
.equal(String(undefined), 'undefined');
assert.equal(String(null), 'null');
assert
.equal(String(false), 'false');
assert.equal(String(true), 'true');
assert
.equal(String(123.45), '123.45'); assert
Pitfall for booleans: If we convert a boolean to a string via String()
, we generally can’t convert it back via Boolean()
:
> String(false)'false'
> Boolean('false')true
The only string for which Boolean()
returns false
, is the empty string.
Plain objects have a default string representation that is not very useful:
> String({a: 1})'[object Object]'
Arrays have a better string representation, but it still hides much information:
> String(['a', 'b'])'a,b'
> String(['a', ['b']])'a,b'
> String([1, 2])'1,2'
> String(['1', '2'])'1,2'
> String([true])'true'
> String(['true'])'true'
> String(true)'true'
Stringifying functions, returns their source code:
> String(function f() {return 4})'function f() {return 4}'
We can override the built-in way of stringifying objects by implementing the method toString()
:
const obj = {
toString() {
return 'hello';
};
}
.equal(String(obj), 'hello'); assert
The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify()
can also be used to convert values to strings:
> JSON.stringify({a: 1})'{"a":1}'
> JSON.stringify(['a', ['b']])'["a",["b"]]'
The caveat is that JSON only supports null
, booleans, numbers, strings, Arrays, and objects (which it always treats as if they were created by object literals).
Tip: The third parameter lets us switch on multiline output and specify how much to indent – for example:
console.log(JSON.stringify({first: 'Jane', last: 'Doe'}, null, 2));
This statement produces the following output:
{
"first": "Jane",
"last": "Doe"
}
Strings can be compared via the following operators:
< <= > >=
There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:
> 'A' < 'B' // oktrue
> 'a' < 'B' // not okfalse
> 'ä' < 'b' // not okfalse
Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl
).
Quick recap of §19 “Unicode – a brief introduction”:
Code points are the atomic parts of Unicode text. Each code point is 21 bits in size.
JavaScript strings implement Unicode via the encoding format UTF-16. It uses one or two 16-bit code units to encode a single code point.
Grapheme clusters (user-perceived characters) represent written symbols, as displayed on screen or paper. One or more code points are needed to encode a single grapheme cluster.
The following code demonstrates that a single code point comprises one or two JavaScript characters. We count the latter via .length
:
// 3 code points, 3 JavaScript characters:
.equal('abc'.length, 3);
assert
// 1 code point, 2 JavaScript characters:
.equal('🙂'.length, 2); assert
The following table summarizes the concepts we have just explored:
Entity | Size | Encoded via |
---|---|---|
JavaScript character (UTF-16 code unit) | 16 bits | – |
Unicode code point | 21 bits | 1–2 code units |
Unicode grapheme cluster | 1+ code points |
Let’s explore JavaScript’s tools for working with code points.
A Unicode code point escape lets us specify a code point hexadecimally (1–5 digits). It produces one or two JavaScript characters.
> '\u{1F642}''🙂'
Unicode escape sequences
In the ECMAScript language specification, Unicode code point escapes and Unicode code unit escapes (which we’ll encounter later) are called Unicode escape sequences.
String.fromCodePoint()
converts a single code point to 1–2 JavaScript characters:
> String.fromCodePoint(0x1F642)'🙂'
.codePointAt()
converts 1–2 JavaScript characters to a single code point:
> '🙂'.codePointAt(0).toString(16)'1f642'
We can iterate over a string, which visits code points (not JavaScript characters). Iteration is described later in this book. One way of iterating is via a for-of
loop:
const str = '🙂a';
.equal(str.length, 3);
assert
for (const codePointChar of str) {
console.log(codePointChar);
}
// Output:
// '🙂'
// 'a'
Array.from()
is also based on iteration and visits code points:
> Array.from('🙂a')[ '🙂', 'a' ]
That makes it a good tool for counting code points:
> Array.from('🙂a').length2
> '🙂a'.length3
Indices and lengths of strings are based on JavaScript characters (as represented by UTF-16 code units).
To specify a code unit hexadecimally, we can use a Unicode code unit escape with exactly four hexadecimal digits:
> '\uD83D\uDE42''🙂'
And we can use String.fromCharCode()
. Char code is the standard library’s name for code unit:
> String.fromCharCode(0xD83D) + String.fromCharCode(0xDE42)'🙂'
To get the char code of a character, use .charCodeAt()
:
> '🙂'.charCodeAt(0).toString(16)'d83d'
If the code point of a character is below 256, we can refer to it via a ASCII escape with exactly two hexadecimal digits:
> 'He\x6C\x6Co''Hello'
(The official name of ASCII escapes is Hexadecimal escape sequences – it was the first escape that used hexadecimal numbers.)
When working with text that may be written in any human language, it’s best to split at the boundaries of grapheme clusters, not at the boundaries of code points.
TC39 is working on Intl.Segmenter
, a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).
Until that proposal becomes a standard, we can use one of several libraries that are available (do a web search for “JavaScript grapheme”).
Tbl. 14 describes how various values are converted to strings.
x |
String(x) |
---|---|
undefined |
'undefined' |
null |
'null' |
boolean | false → 'false' , true → 'true' |
number | Example: 123 → '123' |
bigint | Example: 123n → '123' |
string | x (input, unchanged) |
symbol | Example: Symbol('abc') → 'Symbol(abc)' |
object | Configurable via, e.g., toString() |
String.fromCharCode()
[ES1].charCodeAt()
[ES1]String.fromCodePoint()
[ES6].codePointAt()
[ES6]String.prototype
: finding and matching(String.prototype
is where the methods of strings are stored.)
.endsWith(searchString: string, endPos=this.length): boolean
[ES6]
Returns true
if the string would end with searchString
if its length were endPos
. Returns false
otherwise.
> 'foo.txt'.endsWith('.txt')true
> 'abcde'.endsWith('cd', 4)true
.includes(searchString: string, startPos=0): boolean
[ES6]
Returns true
if the string contains the searchString
and false
otherwise. The search starts at startPos
.
> 'abc'.includes('b')true
> 'abc'.includes('b', 2)false
.indexOf(searchString: string, minIndex=0): number
[ES1]
Returns the lowest index at which searchString
appears within the string or -1, otherwise. Any returned index will be
minIndex` or higher.
> 'abab'.indexOf('a')0
> 'abab'.indexOf('a', 1)2
> 'abab'.indexOf('c')-1
.lastIndexOf(searchString: string, maxIndex=Infinity): number
[ES1]
Returns the highest index at which searchString
appears within the string or -1, otherwise. Any returned index will be
maxIndex` or lower.
> 'abab'.lastIndexOf('ab', 2)2
> 'abab'.lastIndexOf('ab', 1)0
> 'abab'.lastIndexOf('ab')2
[1 of 2] .match(regExp: string | RegExp): RegExpMatchArray | null
[ES3]
If regExp
is a regular expression with flag /g
not set, then .match()
returns the first match for regExp
within the string. Or null
if there is no match. If regExp
is a string, it is used to create a regular expression (think parameter of new RegExp()
) before performing the previously mentioned steps.
The result has the following type:
interface RegExpMatchArray extends Array<string> {
: number;
index: string;
input: undefined | {
groups: string]: string
[key;
} }
Numbered capture groups become Array indices (which is why this type extends Array
). Named capture groups (ES2018) become properties of .groups
. In this mode, .match()
works like RegExp.prototype.exec()
.
Examples:
> 'ababb'.match(/a(b+)/){ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: undefined }
> 'ababb'.match(/a(?<foo>b+)/){ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: { foo: 'b' } }
> 'abab'.match(/x/)null
[2 of 2] .match(regExp: RegExp): string[] | null
[ES3]
If flag /g
of regExp
is set, .match()
returns either an Array with all matches or null
if there was no match.
> 'ababb'.match(/a(b+)/g)[ 'ab', 'abb' ]
> 'ababb'.match(/a(?<foo>b+)/g)[ 'ab', 'abb' ]
> 'abab'.match(/x/g)null
.search(regExp: string | RegExp): number
[ES3]
Returns the index at which regExp
occurs within the string. If regExp
is a string, it is used to create a regular expression (think parameter of new RegExp()
).
> 'a2b'.search(/[0-9]/)1
> 'a2b'.search('[0-9]')1
.startsWith(searchString: string, startPos=0): boolean
[ES6]
Returns true
if searchString
occurs in the string at index startPos
. Returns false
otherwise.
> '.gitignore'.startsWith('.')true
> 'abcde'.startsWith('bc', 1)true
String.prototype
: extracting.slice(start=0, end=this.length): string
[ES3]
Returns the substring of the string that starts at (including) index start
and ends at (excluding) index end
. If an index is negative, it is added to .length
before it is used (-1
becomes this.length-1
, etc.).
> 'abc'.slice(1, 3)'bc'
> 'abc'.slice(1)'bc'
> 'abc'.slice(-2)'bc'
.at(index: number): string | undefined
[ES2022]
Returns the JavaScript character at index
as a string. If index
is negative, it is added to .length
before it is used (-1
becomes this.length-1
, etc.).
> 'abc'.at(0)'a'
> 'abc'.at(-1)'c'
.split(separator: string | RegExp, limit?: number): string[]
[ES3]
Splits the string into an Array of substrings – the strings that occur between the separators. The separator can be a string:
> 'a | b | c'.split('|')[ 'a ', ' b ', ' c' ]
It can also be a regular expression:
> 'a : b : c'.split(/ *: */)[ 'a', 'b', 'c' ]
> 'a : b : c'.split(/( *):( *)/)[ 'a', ' ', ' ', 'b', ' ', ' ', 'c' ]
The last invocation demonstrates that captures made by groups in the regular expression become elements of the returned Array.
Warning: .split('')
splits a string into JavaScript characters. That doesn’t work well when dealing with astral code points (which are encoded as two JavaScript characters). For example, emojis are astral:
> '🙂X🙂'.split('')[ '\uD83D', '\uDE42', 'X', '\uD83D', '\uDE42' ]
Instead, it is better to use Array.from()
(or spreading):
> Array.from('🙂X🙂')[ '🙂', 'X', '🙂' ]
.substring(start: number, end=this.length): string
[ES1]
Use .slice()
instead of this method. .substring()
wasn’t implemented consistently in older engines and doesn’t support negative indices.
String.prototype
: combining.concat(...strings: string[]): string
[ES3]
Returns the concatenation of the string and strings
. 'a'.concat('b')
is equivalent to 'a'+'b'
. The latter is much more popular.
> 'ab'.concat('cd', 'ef', 'gh')'abcdefgh'
.padEnd(len: number, fillString=' '): string
[ES2017]
Appends (fragments of) fillString
to the string until it has the desired length len
. If it already has or exceeds len
, then it is returned without any changes.
> '#'.padEnd(2)'# '
> 'abc'.padEnd(2)'abc'
> '#'.padEnd(5, 'abc')'#abca'
.padStart(len: number, fillString=' '): string
[ES2017]
Prepends (fragments of) fillString
to the string until it has the desired length len
. If it already has or exceeds len
, then it is returned without any changes.
> '#'.padStart(2)' #'
> 'abc'.padStart(2)'abc'
> '#'.padStart(5, 'abc')'abca#'
.repeat(count=0): string
[ES6]
Returns the string, concatenated count
times.
> '*'.repeat()''
> '*'.repeat(3)'***'
String.prototype
: transforming.normalize(form: 'NFC'|'NFD'|'NFKC'|'NFKD' = 'NFC'): string
[ES6]
Normalizes the string according to the Unicode Normalization Forms.
[1 of 2] .replaceAll(searchValue: string | RegExp, replaceValue: string): string
[ES2021]
What to do if you can’t use .replaceAll()
If .replaceAll()
isn’t available on your targeted platform, you can use .replace()
instead. How is explained in §43.6.8.1 “str.replace(searchValue, replacementValue)
[ES3]”.
Replaces all matches of searchValue
with replaceValue
. If searchValue
is a regular expression without flag /g
, a TypeError
is thrown.
> 'x.x.'.replaceAll('.', '#')'x#x#'
> 'x.x.'.replaceAll(/./g, '#')'####'
> 'x.x.'.replaceAll(/./, '#')TypeError: String.prototype.replaceAll called with
a non-global RegExp argument
Special characters in replaceValue
are:
$$
: becomes $
$n
: becomes the capture of numbered group n
(alas, $0
stands for the string '$0'
, it does not refer to the complete match)$&
: becomes the complete match$`
: becomes everything before the match$'
: becomes everything after the matchExamples:
> 'a 1995-12 b'.replaceAll(/([0-9]{4})-([0-9]{2})/g, '|$2|')'a |12| b'
> 'a 1995-12 b'.replaceAll(/([0-9]{4})-([0-9]{2})/g, '|$&|')'a |1995-12| b'
> 'a 1995-12 b'.replaceAll(/([0-9]{4})-([0-9]{2})/g, '|$`|')'a |a | b'
Named capture groups (ES2018) are supported, too:
$<name>
becomes the capture of named group name
Example:
.equal(
assert'a 1995-12 b'.replaceAll(
/(?<year>[0-9]{4})-(?<month>[0-9]{2})/g, '|$<month>|'),
'a |12| b');
[2 of 2] .replaceAll(searchValue: string | RegExp, replacer: (...args: any[]) => string): string
[ES2021]
If the second parameter is a function, occurrences are replaced with the strings it returns. Its parameters args
are:
matched: string
. The complete matchg1: string|undefined
. The capture of numbered group 1g2: string|undefined
. The capture of numbered group 2offset: number
. Where was the match found in the input string?input: string
. The whole input stringconst regexp = /([0-9]{4})-([0-9]{2})/g;
const replacer = (all, year, month) => '|' + all + '|';
.equal(
assert'a 1995-12 b'.replaceAll(regexp, replacer),
'a |1995-12| b');
Named capture groups (ES2018) are supported, too. If there are any, an argument is added at the end with an object whose properties contain the captures:
const regexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/g;
const replacer = (...args) => {
const groups=args.pop();
return '|' + groups.month + '|';
;
}.equal(
assert'a 1995-12 b'.replaceAll(regexp, replacer),
'a |12| b');
.replace(searchValue: string | RegExp, replaceValue: string): string
[ES3]
.replace(searchValue: string | RegExp, replacer: (...args: any[]) => string): string
[ES3]
.replace()
works like .replaceAll()
, but only replaces the first occurrence if searchValue
is a string or a regular expression without /g
:
> 'x.x.'.replace('.', '#')'x#x.'
> 'x.x.'.replace(/./, '#')'#.x.'
For more information on this method, see §43.6.8.1 “str.replace(searchValue, replacementValue)
[ES3]”.
.toUpperCase(): string
[ES1]
Returns a copy of the string in which all lowercase alphabetic characters are converted to uppercase. How well that works for various alphabets, depends on the JavaScript engine.
> '-a2b-'.toUpperCase()'-A2B-'
> 'αβγ'.toUpperCase()'ΑΒΓ'
.toLowerCase(): string
[ES1]
Returns a copy of the string in which all uppercase alphabetic characters are converted to lowercase. How well that works for various alphabets, depends on the JavaScript engine.
> '-A2B-'.toLowerCase()'-a2b-'
> 'ΑΒΓ'.toLowerCase()'αβγ'
.trim(): string
[ES5]
Returns a copy of the string in which all leading and trailing whitespace (spaces, tabs, line terminators, etc.) is gone.
> '\r\n#\t '.trim()'#'
> ' abc '.trim()'abc'
.trimEnd(): string
[ES2019]
Similar to .trim()
but only the end of the string is trimmed:
> ' abc '.trimEnd()' abc'
.trimStart(): string
[ES2019]
Similar to .trim()
but only the beginning of the string is trimmed:
> ' abc '.trimStart()'abc '
Exercise: Using string methods
exercises/strings/remove_extension_test.mjs
Quiz
See quiz app.