RegExp
)|
)regExp.test(str)
: is there a match? [ES3]str.search(regExp)
: at what index is the match? [ES3]regExp.exec(str)
: capturing groups [ES3]str.match(regExp)
: getting all group 0 captures [ES3]str.matchAll(regExp)
: getting an iterable over all match objects [ES2020]regExp.exec()
vs. str.match()
vs. str.matchAll()
str.replace()
and str.replaceAll()
/g
and /y
, and the property .lastIndex
(advanced)
Availability of features
Unless stated otherwise, each regular expression feature has been available since ES3.
The two main ways of creating regular expressions are:
Literal: compiled statically (at load time).
/abc/ui
Constructor: compiled dynamically (at runtime).
new RegExp('abc', 'ui')
Both regular expressions have the same two parts:
abc
– the actual regular expression.u
and i
. Flags configure how the pattern is interpreted. For example, i
enables case-insensitive matching. A list of available flags is given later in this chapter.There are two variants of the constructor RegExp()
:
new RegExp(pattern : string, flags = '')
[ES3]
A new regular expression is created as specified via pattern
. If flags
is missing, the empty string ''
is used.
new RegExp(regExp : RegExp, flags = regExp.flags)
[ES6]
regExp
is cloned. If flags
is provided, then it determines the flags of the clone.
The second variant is useful for cloning regular expressions, optionally while modifying them. Flags are immutable and this is the only way of changing them – for example:
function copyAndAddFlags(regExp, flagsToAdd='') {
// The constructor doesn’t allow duplicate flags;
// make sure there aren’t any:
const newFlags = Array.from(
new Set(regExp.flags + flagsToAdd)
.join('');
)return new RegExp(regExp, newFlags);
}.equal(/abc/i.flags, 'i');
assert.equal(copyAndAddFlags(/abc/i, 'g').flags, 'gi'); assert
At the top level of a regular expression, the following syntax characters are special. They are escaped by prefixing a backslash (\
).
\ ^ $ . * + ? ( ) [ ] { } |
In regular expression literals, we must escape slashes:
> /\//.test('/')true
In the argument of new RegExp()
, we don’t have to escape slashes:
> new RegExp('/').test('/')true
Atoms are the basic building blocks of regular expressions.
^
, $
, etc.). Pattern characters match themselves. Examples: A b %
.
matches any character. We can use the flag /s
(dotAll
) to control if the dot matches line terminators or not (more below).\f
: form feed (FF)\n
: line feed (LF)\r
: carriage return (CR)\t
: character tabulation\v
: line tabulation\cA
(Ctrl-A), …, \cZ
(Ctrl-Z)\u00E4
/u
): \u{1F44D}
\d
: digits (same as [0-9]
)
\D
: non-digits\w
: “word” characters (same as [A-Za-z0-9_]
, related to identifiers in programming languages)
\W
: non-word characters\s
: whitespace (space, tab, line terminators, etc.)
\S
: non-whitespace\p{White_Space}
, \P{White_Space}
, etc.
/u
.In the Unicode standard, each character has properties – metadata describing it. Properties play an important role in defining the nature of a character. Quoting the Unicode Standard, Sect. 3.3, D3:
The semantics of a character are determined by its identity, normative properties, and behavior.
These are a few examples of properties:
Name
: a unique name, composed of uppercase letters, digits, hyphens, and spaces – for example:
Name = LATIN CAPITAL LETTER A
🙂
: Name = SLIGHTLY SMILING FACE
General_Category
: categorizes characters – for example:
General_Category = Lowercase_Letter
General_Category = Currency_Symbol
White_Space
: used for marking invisible spacing characters, such as spaces, tabs and newlines – for example:
White_Space = True
White_Space = False
Age
: version of the Unicode Standard in which a character was introduced – for example: The Euro sign € was added in version 2.1 of the Unicode standard.
Age = 2.1
Block
: a contiguous range of code points. Blocks don’t overlap and their names are unique. For example:
Block = Basic_Latin
(range U+0000..U+007F)🙂
: Block = Emoticons
(range U+1F600..U+1F64F)Script
: is a collection of characters used by one or more writing systems.
Script = Greek
Script = Cyrillic
Unicode property escapes look like this:
\p{prop=value}
: matches all characters whose property prop
has the value value
.\P{prop=value}
: matches all characters that do not have a property prop
whose value is value
.\p{bin_prop}
: matches all characters whose binary property bin_prop
is True.\P{bin_prop}
: matches all characters whose binary property bin_prop
is False.Comments:
We can only use Unicode property escapes if the flag /u
is set. Without /u
, \p
is the same as p
.
Forms (3) and (4) can be used as abbreviations if the property is General_Category
. For example, the following two escapes are equivalent:
\p{Uppercase_Letter}
\p{General_Category=Uppercase_Letter}
Examples:
Checking for whitespace:
> /^\p{White_Space}+$/u.test('\t \n\r')true
Checking for Greek letters:
> /^\p{Script=Greek}+$/u.test('μετά')true
Deleting any letters:
> '1π2ü3é4'.replace(/\p{Letter}/ug, '')'1234'
Deleting lowercase letters:
> 'AbCdEf'.replace(/\p{Lowercase_Letter}/ug, '')'ACE'
Further reading:
A character class wraps class ranges in square brackets. The class ranges specify a set of characters:
[«class ranges»]
matches any character in the set.[^«class ranges»]
matches any character not in the set.Rules for class ranges:
Non-syntax characters stand for themselves: [abc]
Only the following four characters are special and must be escaped via slashes:
^ \ - ]
^
only has to be escaped if it comes first.-
need not be escaped if it comes first or last.Character escapes (\n
, \u{1F44D}
, etc.) have the usual meaning.
\b
stands for backspace. Elsewhere in a regular expression, it matches word boundaries.Character class escapes (\d
, \p{White_Space}
, etc.) have the usual meaning.
Ranges of characters are specified via dashes: [a-z]
(#+)
\1
, \2
, etc.(?<hashes>#+)
\k<hashes>
(?:#+)
By default, all of the following quantifiers are greedy (they match as many characters as possible):
?
: match never or once*
: match zero or more times+
: match one or more times{n}
: match n
times{n,}
: match n
or more times{n,m}
: match at least n
times, at most m
times.To make them reluctant (so that they match as few characters as possible), put question marks (?
) after them:
> /".*"/.exec('"abc"def"')[0] // greedy'"abc"def"'
> /".*?"/.exec('"abc"def"')[0] // reluctant'"abc"'
^
matches only at the beginning of the input$
matches only at the end of the input\b
matches only at a word boundary
\B
matches only when not at a word boundaryPositive lookahead: (?=«pattern»)
matches if pattern
matches what comes next.
Example: sequences of lowercase letters that are followed by an X
.
> 'abcX def'.match(/[a-z]+(?=X)/g)[ 'abc' ]
Note that the X
itself is not part of the matched substring.
Negative lookahead: (?!«pattern»)
matches if pattern
does not match what comes next.
Example: sequences of lowercase letters that are not followed by an X
.
> 'abcX def'.match(/[a-z]+(?!X)/g)[ 'ab', 'def' ]
Positive lookbehind: (?<=«pattern»)
matches if pattern
matches what came before.
Example: sequences of lowercase letters that are preceded by an X
.
> 'Xabc def'.match(/(?<=X)[a-z]+/g)[ 'abc' ]
Negative lookbehind: (?<!«pattern»)
matches if pattern
does not match what came before.
Example: sequences of lowercase letters that are not preceded by an X
.
> 'Xabc def'.match(/(?<!X)[a-z]+/g)[ 'bc', 'def' ]
Example: replace “.js” with “.html”, but not in “Node.js”.
> 'Node.js: index.js and main.js'.replace(/(?<!Node)\.js/g, '.html')'Node.js: index.html and main.html'
|
)Caveat: this operator has low precedence. Use groups if necessary:
^aa|zz$
matches all strings that start with aa
and/or end with zz
. Note that |
has a lower precedence than ^
and $
.^(aa|zz)$
matches the two strings 'aa'
and 'zz'
.^a(a|z)z$
matches the two strings 'aaz'
and 'azz'
.Literal flag | Property name | ES | Description |
---|---|---|---|
d |
hasIndices |
ES2022 | Switch on match indices |
g |
global |
ES3 | Match multiple times |
i |
ignoreCase |
ES3 | Match case-insensitively |
m |
multiline |
ES3 | ^ and $ match per line |
s |
dotAll |
ES2018 | Dot matches line terminators |
u |
unicode |
ES6 | Unicode mode (recommended) |
y |
sticky |
ES6 | No characters between matches |
The following regular expression flags are available in JavaScript (tbl. 21 provides a compact overview):
/d
(.hasIndices
): Some RegExp-related methods return match objects that describe where the regular expression matched in an input string. If this flag is on, each match object includes match indices which tell us where each group capture starts and ends. More information: §43.5.1 “Match indices in match objects [ES2022]”.
/g
(.global
) fundamentally changes how the following methods work.
RegExp.prototype.test()
RegExp.prototype.exec()
String.prototype.match()
How, is explained in §43.7 “The flags /g
and /y
, and the property .lastIndex
”. In a nutshell, without /g
, the methods only consider the first match for a regular expression in an input string. With /g
, they consider all matches.
/i
(.ignoreCase
) switches on case-insensitive matching:
> /a/.test('A')false
> /a/i.test('A')true
/m
(.multiline
): If this flag is on, ^
matches the beginning of each line and $
matches the end of each line. If it is off, ^
matches the beginning of the whole input string and $
matches the end of the whole input string.
> 'a1\na2\na3'.match(/^a./gm)[ 'a1', 'a2', 'a3' ]
> 'a1\na2\na3'.match(/^a./g)[ 'a1' ]
/u
(.unicode
): This flag switches on the Unicode mode for a regular expression. That mode is explained in the next subsection.
/y
(.sticky
): This flag mainly makes sense in conjunction with /g
. When both are switched on, any match must directly follow the previous one (that is, it must start at index .lastIndex
of the regular expression object). Therefore, the first match must be at index 0.
> 'a1a2 a3'.match(/a./gy)[ 'a1', 'a2' ]
> '_a1a2 a3'.match(/a./gy) // first match must be at index 0null
> 'a1a2 a3'.match(/a./g)[ 'a1', 'a2', 'a3' ]
> '_a1a2 a3'.match(/a./g)[ 'a1', 'a2', 'a3' ]
The main use case for /y
is tokenization (during parsing). More information on this flag: §43.7 “The flags /g
and /y
, and the property .lastIndex
”.
/s
(.dotAll
): By default, the dot does not match line terminators. With this flag, it does:
> /./.test('\n')false
> /./s.test('\n')true
Workaround: If /s
isn’t supported, we can use [^]
instead of a dot.
> /[^]/.test('\n')true
Consider the following regular expression: /“([^”]+)”/udg
In which order should we list its flags? Two options are:
/dgu
/u
is most fundamental etc.): /ugd
Given that (2) is not obvious, (1) is the better choice. JavaScript also uses it for the RegExp property .flags
:
> /a/ismudgy.flags'dgimsuy'
/u
The flag /u
switches on a special Unicode mode for regular expressions. That mode enables several features:
In patterns, we can use Unicode code point escapes such as \u{1F42A}
to specify characters. Code unit escapes such as \u03B1
only have a range of four hexadecimal digits (which corresponds to the basic multilingual plane).
In patterns, we can use Unicode property escapes such as \p{White_Space}
.
Many escapes are now forbidden. For example: \a \- \:
Pattern characters always match themselves:
> /pa-:/.test('pa-:')true
Without /u
, there are some pattern characters that still match themselves if we escape them with backslashes:
> /\p\a\-\:/.test('pa-:')true
With /u
:
\p
starts a Unicode property escape.The atomic units for matching are Unicode characters (code points), not JavaScript characters (code units).
The following subsections explain the last item in more detail. They use the following Unicode character to explain when the atomic units are Unicode characters and when they are JavaScript characters:
const codePoint = '🙂';
const codeUnits = '\uD83D\uDE42'; // UTF-16
.equal(codePoint, codeUnits); // same string! assert
I’m only switching between 🙂
and \uD83D\uDE42
, to illustrate how JavaScript sees things. Both are equivalent and can be used interchangeably in strings and regular expressions.
With /u
, the two code units of 🙂
are treated as a single character:
> /^[🙂]$/u.test('🙂')true
Without /u
, 🙂
is treated as two characters:
> /^[\uD83D\uDE42]$/.test('\uD83D\uDE42')false
> /^[\uD83D\uDE42]$/.test('\uDE42')true
Note that ^
and $
demand that the input string have a single character. That’s why the first result is false
.
.
) matches Unicode characters, not JavaScript charactersWith /u
, the dot operator matches Unicode characters:
> '🙂'.match(/./gu).length1
.match()
plus /g
returns an Array with all the matches of a regular expression.
Without /u
, the dot operator matches JavaScript characters:
> '\uD83D\uDE80'.match(/./g).length2
With /u
, a quantifier applies to the whole preceding Unicode character:
> /^🙂{3}$/u.test('🙂🙂🙂')true
Without /u
, a quantifier only applies to the preceding JavaScript character:
> /^\uD83D\uDE80{3}$/.test('\uD83D\uDE80\uDE80\uDE80')true
Noteworthy:
.lastIndex
is a real instance property. All other properties are implemented via getters..lastIndex
is the only mutable property. All other properties are read-only. If we want to change them, we need to copy the regular expression (consult §43.1.2 “Cloning and non-destructively modifying regular expressions” for details).Each regular expression flag exists as a property with a longer, more descriptive name:
> /a/i.ignoreCasetrue
> /a/.ignoreCasefalse
This is the complete list of flag properties:
.dotAll
(/s
).global
(/g
).hasIndices
(/d
).ignoreCase
(/i
).multiline
(/m
).sticky
(/y
).unicode
(/u
)Each regular expression also has the following properties:
.source
[ES3]: The regular expression pattern
> /abc/ig.source'abc'
.flags
[ES6]: The flags of the regular expression
> /abc/ig.flags'gi'
.lastIndex
[ES3]: Used when flag /g
is switched on. Consult §43.7 “The flags /g
and /y
, and the property .lastIndex
” for details.
Several regular expression-related methods return so-called match objects to provide detailed information for the locations where a regular expression matches an input string. These methods are:
RegExp.prototype.exec()
returns null
or single match objects.String.prototype.match()
returns null
or single match objects (if flag /g
is not set).String.prototype.matchAll()
returns an iterable of match objects (flag /g
must be set; otherwise, an exception is thrown).This is an example:
.deepEqual(
assert/(a+)b/d.exec('ab aaab'),
{0: 'ab',
1: 'a',
index: 0,
input: 'ab aaab',
groups: undefined,
indices: {
0: [0, 2],
1: [0, 1],
groups: undefined
,
}
}; )
The result of .exec()
is a match object for the first match with the following properties:
[0]
: the complete substring matched by the regular expression[1]
: capture of numbered group 1 (etc.).index
: where did the match occur?.input
: the string that was matched against.groups
: captures of named groups (see §43.6.4.2 “Named capture groups [ES2018]”).indices
: the index ranges of captured groups
/d
is switched on.Match indices are a feature of match objects: If we turn it on via the regular expression flag /d
(property .hasIndices
), they record the start and end indices of where groups were captured.
This is how we access the captures of numbered groups:
const matchObj = /(a+)(b+)/d.exec('aaaabb');
.equal(
assert1], 'aaaa'
matchObj[;
).equal(
assert2], 'bb'
matchObj[; )
Due to the regular expression flag /d
, matchObj
also has a property .indices
that records for each numbered group where it was captured in the input string:
.deepEqual(
assert.indices[1], [0, 4]
matchObj;
).deepEqual(
assert.indices[2], [4, 6]
matchObj; )
The captures of named groups are accessed likes this:
const matchObj = /(?<as>a+)(?<bs>b+)/d.exec('aaaabb');
.equal(
assert.groups.as, 'aaaa');
matchObj.equal(
assert.groups.bs, 'bb'); matchObj
Their indices are stored in matchObj.indices.groups
:
.deepEqual(
assert.indices.groups.as, [0, 4]);
matchObj.deepEqual(
assert.indices.groups.bs, [4, 6]); matchObj
One important use case for match indices are parsers that point to where exactly a syntactic error is located. The following code solves a related problem: It points to where quoted content starts and where it ends (see demonstration at the end).
const reQuoted = /“([^”]+)”/dgu;
function pointToQuotedText(str) {
const startIndices = new Set();
const endIndices = new Set();
for (const match of str.matchAll(reQuoted)) {
const [start, end] = match.indices[1];
.add(start);
startIndices.add(end);
endIndices
}let result = '';
for (let index=0; index < str.length; index++) {
if (startIndices.has(index)) {
+= '[';
result else if (endIndices.has(index+1)) {
} += ']';
result else {
} += ' ';
result
}
}return result;
}
.equal(
assertpointToQuotedText(
'They said “hello” and “goodbye”.'),
' [ ] [ ] '
; )
By default, regular expressions match anywhere in a string:
> /a/.test('__a__')true
We can change that by using assertions such as ^
or by using the flag /y
:
> /^a/.test('__a__')false
> /^a/.test('a__')true
regExp.test(str)
: is there a match? [ES3]The regular expression method .test()
returns true
if regExp
matches str
:
> /bc/.test('ABCD')false
> /bc/i.test('ABCD')true
> /\.mjs$/.test('main.mjs')true
With .test()
we should normally avoid the /g
flag. If we use it, we generally don’t get the same result every time we call the method:
> const r = /a/g;
> r.test('aab')true
> r.test('aab')true
> r.test('aab')false
The results are due to /a/
having two matches in the string. After all of those were found, .test()
returns false
.
str.search(regExp)
: at what index is the match? [ES3]The string method .search()
returns the first index of str
at which there is a match for regExp
:
> '_abc_'.search(/abc/)1
> 'main.mjs'.search(/\.mjs$/)4
regExp.exec(str)
: capturing groups [ES3]Without the flag /g
, .exec()
returns a match object for the first match of regExp
in str
:
.deepEqual(
assert/(a+)b/.exec('ab aab'),
{0: 'ab',
1: 'a',
index: 0,
input: 'ab aab',
groups: undefined,
}; )
The previous example contained a single numbered group. The following example demonstrates named groups:
.deepEqual(
assert/(?<as>a+)b/.exec('ab aab'),
{0: 'ab',
1: 'a',
index: 0,
input: 'ab aab',
groups: { as: 'a' },
}; )
In the result of .exec()
, we can see that a named group is also a numbered group – its capture exists twice:
'1'
).groups.as
). Better alternative for retrieving all matches: str.matchAll(regExp)
[ES2020]
Since ECMAScript 2020, JavaScript has another method for retrieving all matches: str.matchAll(regExp)
. This method is easier to use and has fewer caveats.
If we want to retrieve all matches of a regular expression (not just the first one), we need to switch on the flag /g
. Then we can call .exec()
multiple times and get one match each time. After the last match, .exec()
returns null
.
> const regExp = /(a+)b/g;
> regExp.exec('ab aab'){ 0: 'ab', 1: 'a', index: 0, input: 'ab aab', groups: undefined }
> regExp.exec('ab aab'){ 0: 'aab', 1: 'aa', index: 3, input: 'ab aab', groups: undefined }
> regExp.exec('ab aab')null
Therefore, we can loop over all matches as follows:
const regExp = /(a+)b/g;
const str = 'ab aab';
let match;
// Check for null via truthiness
// Alternative: while ((match = regExp.exec(str)) !== null)
while (match = regExp.exec(str)) {
console.log(match[1]);
}// Output:
// 'a'
// 'aa'
Be careful when sharing regular expressions with /g
!
Sharing regular expressions with /g
has a few pitfalls, which are explained later.
Exercise: Extracting quoted text via .exec()
exercises/regexps/extract_quoted_test.mjs
str.match(regExp)
: getting all group 0 captures [ES3]Without /g
, .match()
works like .exec()
– it returns a single match object.
With /g
, .match()
returns all substrings of str
that match regExp
:
> 'ab aab'.match(/(a+)b/g)[ 'ab', 'aab' ]
If there is no match, .match()
returns null
:
> 'xyz'.match(/(a+)b/g)null
We can use the nullish coalescing operator (??
) to protect ourselves against null
:
const numberOfMatches = (str.match(regExp) ?? []).length;
str.matchAll(regExp)
: getting an iterable over all match objects [ES2020]This is how .matchAll()
is invoked:
const matchIterable = str.matchAll(regExp);
Given a string and a regular expression, .matchAll()
returns an iterable over the match objects of all matches.
In the following example, we use Array.from()
to convert iterables to Arrays so that we can compare them better.
> Array.from('-a-a-a'.matchAll(/-(a)/ug))[
{ 0:'-a', 1:'a', index: 0, input: '-a-a-a', groups: undefined },
{ 0:'-a', 1:'a', index: 2, input: '-a-a-a', groups: undefined },
{ 0:'-a', 1:'a', index: 4, input: '-a-a-a', groups: undefined },
]
Flag /g
must be set:
> Array.from('-a-a-a'.matchAll(/-(a)/u))TypeError: String.prototype.matchAll called with a non-global
RegExp argument
.matchAll()
isn’t affected by regExp.lastIndex
and doesn’t change it.
.matchAll()
.matchAll()
could be implemented via .exec()
as follows:
function* matchAll(str, regExp) {
if (!regExp.global) {
throw new TypeError('Flag /g must be set!');
}const localCopy = new RegExp(regExp, regExp.flags);
let match;
while (match = localCopy.exec(str)) {
yield match;
} }
Making a local copy ensures two things:
regex.lastIndex
isn’t changed.localCopy.lastIndex
is zero.Using matchAll()
:
const str = '"fee" "fi" "fo" "fum"';
const regex = /"([^"]*)"/g;
for (const match of matchAll(str, regex)) {
console.log(match[1]);
}// Output:
// 'fee'
// 'fi'
// 'fo'
// 'fum'
regExp.exec()
vs. str.match()
vs. str.matchAll()
The following table summarizes the differences between three methods:
Without /g |
With /g |
|
---|---|---|
regExp.exec(str) |
First match object | Next match object or null |
str.match(regExp) |
First match object | Array of group 0 captures |
str.matchAll(regExp) |
TypeError |
Iterable over match objects |
str.replace()
and str.replaceAll()
Both replacing methods have two parameters:
str.replace(searchValue, replacementValue)
str.replaceAll(searchValue, replacementValue)
searchValue
can be:
replacementValue
can be:
$
has special meaning and lets us insert captures of groups and more (details are explained later).The two methods differ as follows:
.replace()
replaces the first occurrence of a string or a regular expression without /g
..replaceAll()
replaces all occurrences of a string or a regular expression with /g
.This table summarizes how that works:
Search for: → |
string | RegExp w/o /g |
RegExp with /g |
---|---|---|---|
.replace |
First occurrence | First occurrence | (All occurrences) |
.replaceAll |
All occurrences | TypeError |
All occurrences |
The last column of .replace()
is in parentheses because this method existed long before .replaceAll()
and therefore supports functionality that should now be handled via the latter method. If we could change that, .replace()
would throw a TypeError
here.
We first explore how .replace()
and .replaceAll()
work individually when replacementValue
is a simple string (without the character $
). Then we examine how both are affected by more complicated replacement values.
str.replace(searchValue, replacementValue)
[ES3]How .replace()
operates is influenced by its first parameter searchValue
:
Regular expression without /g
: Replace first match of this regular expression.
> 'aaa'.replace(/a/, 'x')'xaa'
String: Replace first occurrence of this string (the string is interpreted verbatim, not as a regular expression).
> 'aaa'.replace('a', 'x')'xaa'
Regular expression with /g
: Replace all matches of this regular expression.
> 'aaa'.replace(/a/g, 'x')'xxx'
Recommendation: If .replaceAll()
is available, it’s better to use that method in this case – its purpose is to replace multiple occurrences.
If we want to replace every occurrence of a string, we have two options:
We can use .replaceAll()
(which was introduced in ES2021).
Later in this chapter, we will encounter [the tool function escapeForRegExp()
) which will help us convert a string into a regular expression that matches that string multiple times (e.g., '*'
becomes /\*/g
).
str.replaceAll(searchValue, replacementValue)
[ES2021]How .replaceAll()
operates is influenced by its first parameter searchValue
:
Regular expression with /g
: Replace all matches of this regular expression.
> 'aaa'.replaceAll(/a/g, 'x')'xxx'
String: Replace all occurrences of this string (the string is interpreted verbatim, not as a regular expression).
> 'aaa'.replaceAll('a', 'x')'xxx'
Regular expression without /g
: A TypeError
is thrown (because the purpose of .replaceAll()
is to replace multiple occurrences).
> 'aaa'.replaceAll(/a/, 'x')TypeError: String.prototype.replaceAll called with
a non-global RegExp argument
replacementValue
of .replace()
and .replaceAll()
So far, we have only used the parameter replacementValue
with simple strings, but it can do more. If its value is:
A string, then matches are replaced with this string. The character $
has special meaning and lets us insert captures of groups and more (read on for details).
A function, then matches are replaced by strings that are computed via this function.
replacementValue
is a stringIf the replacement value is a string, the dollar sign has special meaning – it inserts text matched by the regular expression:
Text | Result |
---|---|
$$ |
single $ |
$& |
complete match |
$` |
text before match |
$' |
text after match |
$n |
capture of numbered group n (n > 0) |
$<name> |
capture of named group name [ES2018] |
Example: Inserting the text before, inside, and after the matched substring.
> 'a1 a2'.replaceAll(/a/g, "($`|$&|$')")'(|a|1 a2)1 (a1 |a|2)2'
Example: Inserting the captures of numbered groups.
> const regExp = /^([A-Za-z]+): (.*)$/ug;
> 'first: Jane'.replaceAll(regExp, 'KEY: $1, VALUE: $2')'KEY: first, VALUE: Jane'
Example: Inserting the captures of named groups.
> const regExp = /^(?<key>[A-Za-z]+): (?<value>.*)$/ug;
> 'first: Jane'.replaceAll(regExp, 'KEY: $<key>, VALUE: $<value>')'KEY: first, VALUE: Jane'
Exercise: Change quotes via .replace()
and a named group
exercises/regexps/change_quotes_test.mjs
replacementValue
is a functionIf the replacement value is a function, we can compute each replacement. In the following example, we multiply each non-negative integer that we find by two.
.equal(
assert'3 cats and 4 dogs'.replaceAll(/[0-9]+/g, (all) => 2 * Number(all)),
'6 cats and 8 dogs'
; )
The replacement function gets the following parameters. Note how similar they are to match objects. These parameters are all positional, but I’ve included how one might name them:
all
: complete matchg1
: capture of numbered group 1index
: where did the match occur?input
: the string in which we are replacinggroups
[ES2018]: captures of named groups (an object). Always the last parameter.If we are only interested in groups
, we can use the following technique:
const result = 'first=jane, last=doe'.replace(
/(?<key>[a-z]+)=(?<value>[a-z]+)/g,
...args) => { // (A)
(const groups = args.at(-1); // (B)
const {key, value} = groups;
return key.toUpperCase() + '=' + value.toUpperCase();
;
}).equal(result, 'FIRST=JANE, LAST=DOE'); assert
Due to the rest parameter in line A, args
contains an Array with all parameters. We access the last parameter via the Array method .at()
in line B.
String.prototype.split()
is described in the chapter on strings. Its first parameter of String.prototype.split()
is either a string or a regular expression. If it is the latter, then captures of groups appear in the result:
> 'a:b : c'.split(':')[ 'a', 'b ', ' c' ]
> 'a:b : c'.split(/ *: */)[ 'a', 'b', 'c' ]
> 'a:b : c'.split(/( *):( *)/)[ 'a', '', '', 'b', ' ', ' ', 'c' ]
/g
and /y
, and the property .lastIndex
(advanced)In this section, we examine how the RegExp flags /g
and /y
work and how they depend on the RegExp property .lastIndex
. We’ll also discover an interesting use case for .lastIndex
that you may find surprising.
/g
and /y
Every method reacts differently to /g
and /y
; this gives us a rough general idea:
/g
(.global
, ES3): The regular expression should match multiple times, anywhere in a string./y
(.sticky
, ES6): Any match inside a string should immediately follow a previous match (the matches “stick” together).If a regular expression has neither the flag /g
nor the flag /y
, matching happens once and starts at the beginning.
With either /g
or /y
, matching is performed relative to a “current position” inside the input string. That position is stored in the regular expression property .lastIndex
.
There are three groups of regular-expression-related methods:
The string methods .search(regExp)
and .split(regExp)
completely ignore /g
and /y
(and therefore also .lastIndex
).
The RegExp
methods .exec(str)
and .test(str)
change in two ways if either /g
or /y
is set.
First, we get multiple matches, by calling one method repeatedly. Each time, it returns either another result (a match object or true
) or an “end of results” value (null
or false
).
Second, the regular expression property .lastIndex
is used to step through the input string. On one hand, .lastIndex
determines where matching starts:
/g
means that a match must begin at .lastIndex
or later.
/y
means that a match must begin at .lastIndex
. That is, the beginning of the regular expression is anchored to .lastIndex
.
Note that ^
and $
continue to work as usually: They anchor matches to the beginning or end of the input string, unless .multiline
is set. Then they anchor to the beginnings or ends of lines.
On the other hand, .lastIndex
is set to one plus the last index of the previous match.
All other methods are affected as follows:
/g
leads to multiple matches./y
leads to a single match that must start at .lastIndex
./yg
leads to multiple matches without gaps.This was a first overview. The next sections get into more details.
/g
and /y
?regExp.exec(str)
[ES3]Without /g
and /y
, .exec()
ignores .lastIndex
and always returns a match object for the first match:
> const re = /#/; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex][{ 0: '#', index: 0, input: '##-#' }, 1]
> [re.exec('##-#'), re.lastIndex][{ 0: '#', index: 0, input: '##-#' }, 1]
With /g
, the match must start at .lastIndex
or later. .lastIndex
is updated. If there is no match, null
is returned.
> const re = /#/g; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex][{ 0: '#', index: 1, input: '##-#' }, 2]
> [re.exec('##-#'), re.lastIndex][{ 0: '#', index: 3, input: '##-#' }, 4]
> [re.exec('##-#'), re.lastIndex][null, 0]
With /y
, the match must start at exactly .lastIndex
. .lastIndex
is updated. If there is no match, null
is returned.
> const re = /#/y; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex][{ 0: '#', index: 1, input: '##-#' }, 2]
> [re.exec('##-#'), re.lastIndex][null, 0]
With /yg
, .exec()
behaves the same as with /y
.
regExp.test(str)
[ES3]This method behaves the same same as .exec()
, but instead of returning a match object, it returns true
, and instead of returning null
, it returns false
.
For example, without either /g
or /y
, the result is always true
:
> const re = /#/; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex][true, 1]
> [re.test('##-#'), re.lastIndex][true, 1]
With /g
, there are two matches:
> const re = /#/g; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex][true, 2]
> [re.test('##-#'), re.lastIndex][true, 4]
> [re.test('##-#'), re.lastIndex][false, 0]
With /y
, there is only one match:
> const re = /#/y; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex][true, 2]
> [re.test('##-#'), re.lastIndex][false, 0]
With /yg
, .test()
behaves the same as with /y
.
str.match(regExp)
[ES3]Without /g
, .match()
works like .exec()
. Either without /y
:
> const re = /#/; re.lastIndex = 1;
> ['##-#'.match(re), re.lastIndex][{ 0: '#', index: 0, input: '##-#' }, 1]
> ['##-#'.match(re), re.lastIndex][{ 0: '#', index: 0, input: '##-#' }, 1]
Or with /y
:
> const re = /#/y; re.lastIndex = 1;
> ['##-#'.match(re), re.lastIndex][{ 0: '#', index: 1, input: '##-#' }, 2]
> ['##-#'.match(re), re.lastIndex][null, 0]
With /g
, we get all matches (group 0) in an Array. .lastIndex
is ignored and reset to zero.
> const re = /#/g; re.lastIndex = 1;
> '##-#'.match(re)['#', '#', '#']
> re.lastIndex0
/yg
works like /g
, but there are no gaps between matches:
> const re = /#/yg; re.lastIndex = 1;
> '##-#'.match(re)['#', '#']
> re.lastIndex0
str.matchAll(regExp)
[ES2020]If /g
is not set, .matchAll()
throws an exception:
> const re = /#/y; re.lastIndex = 1;
> '##-#'.matchAll(re)TypeError: String.prototype.matchAll called with
a non-global RegExp argument
If /g
is set, matching starts at .lastIndex
and that property isn’t changed:
> const re = /#/g; re.lastIndex = 1;
> Array.from('##-#'.matchAll(re))[
{ 0: '#', index: 1, input: '##-#' },
{ 0: '#', index: 3, input: '##-#' },
]
> re.lastIndex1
If /yg
is set, the behavior is the same as with /g
, but there are no gaps between matches:
> const re = /#/yg; re.lastIndex = 1;
> Array.from('##-#'.matchAll(re))[
{ 0: '#', index: 1, input: '##-#' },
]
> re.lastIndex1
str.replace(regExp, str)
[ES3]Without /g
and /y
, only the first occurrence is replaced:
> const re = /#/; re.lastIndex = 1;
> '##-#'.replace(re, 'x')'x#-#'
> re.lastIndex1
With /g
, all occurrences are replaced. .lastIndex
is ignored but reset to zero.
> const re = /#/g; re.lastIndex = 1;
> '##-#'.replace(re, 'x')'xx-x'
> re.lastIndex0
With /y
, only the (first) occurrence at .lastIndex
is replaced. .lastIndex
is updated.
> const re = /#/y; re.lastIndex = 1;
> '##-#'.replace(re, 'x')'#x-#'
> re.lastIndex2
/yg
works like /g
, but gaps between matches are not allowed:
> const re = /#/yg; re.lastIndex = 1;
> '##-#'.replace(re, 'x')'xx-#'
> re.lastIndex0
str.replaceAll(regExp, str)
[ES2021].replaceAll()
works like .replace()
but throws an exception if /g
is not set:
> const re = /#/y; re.lastIndex = 1;
> '##-#'.replaceAll(re, 'x')TypeError: String.prototype.replaceAll called
with a non-global RegExp argument
/g
and /y
and how to deal with themWe will first look at four pitfalls of /g
and /y
and then at ways of dealing with those pitfalls.
/g
or /y
A regular expression with /g
can’t be inlined. For example, in the following while
loop, the regular expression is created fresh, every time the condition is checked. Therefore, its .lastIndex
is always zero and the loop never terminates.
let matchObj;
// Infinite loop
while (matchObj = /a+/g.exec('bbbaabaaa')) {
console.log(matchObj[0]);
}
With /y
, the problem is the same.
/g
or /y
can break codeIf code expects a regular expression with /g
and has a loop over the results of .exec()
or .test()
, then a regular expression without /g
can cause an infinite loop:
function collectMatches(regExp, str) {
const matches = [];
let matchObj;
// Infinite loop
while (matchObj = regExp.exec(str)) {
.push(matchObj[0]);
matches
}return matches;
}collectMatches(/a+/, 'bbbaabaaa'); // Missing: flag /g
Why is there an infinite loop? Because .exec()
always returns the first result, a match object, and never null
.
With /y
, the problem is the same.
/g
or /y
can break codeWith .test()
, there is another caveat: It is affected by .lastIndex
. Therefore, if we want to check exactly once if a regular expression matches a string, then the regular expression must not have /g
. Otherwise, we generally get a different result every time we call .test()
:
> const regExp = /^X/g;
> [regExp.test('Xa'), regExp.lastIndex][ true, 1 ]
> [regExp.test('Xa'), regExp.lastIndex][ false, 0 ]
> [regExp.test('Xa'), regExp.lastIndex][ true, 1 ]
The first invocation produces a match and updates .lastIndex
. The second invocation does not find a match and resets .lastIndex
to zero.
If we create a regular expression specifically for .test()
, then we probably won’t add /g
. However, the likeliness of encountering /g
increases if we use the same regular expression for replacing and for testing.
Once again, this problem also exists with /y
:
> const regExp = /^X/y;
> regExp.test('Xa')true
> regExp.test('Xa')false
> regExp.test('Xa')true
.lastIndex
isn’t zeroGiven all the regular expression operations that are affected by .lastIndex
, we must be careful with many algorithms that .lastIndex
is zero at the beginning. Otherwise, we may get unexpected results:
function countMatches(regExp, str) {
let count = 0;
while (regExp.test(str)) {
++;
count
}return count;
}
const myRegExp = /a/g;
.lastIndex = 4;
myRegExp.equal(
assertcountMatches(myRegExp, 'babaa'), 1); // should be 3
Normally, .lastIndex
is zero in newly created regular expressions and we won’t change it explicitly like we did in the example. But .lastIndex
can still end up not being zero if we use the regular expression multiple times.
/g
and /y
As an example of dealing with /g
and .lastIndex
, we revisit countMatches()
from the previous example. How do we prevent a wrong regular expression from breaking our code? Let’s look at three approaches.
First, we can throw an exception if /g
isn’t set or .lastIndex
isn’t zero:
function countMatches(regExp, str) {
if (!regExp.global) {
throw new Error('Flag /g of regExp must be set');
}if (regExp.lastIndex !== 0) {
throw new Error('regExp.lastIndex must be zero');
}
let count = 0;
while (regExp.test(str)) {
++;
count
}return count;
}
Second, we can clone the parameter. That has the added benefit that regExp
won’t be changed.
function countMatches(regExp, str) {
const cloneFlags = regExp.flags + (regExp.global ? '' : 'g');
const clone = new RegExp(regExp, cloneFlags);
let count = 0;
while (clone.test(str)) {
++;
count
}return count;
}
.lastIndex
or flagsSeveral regular expression operations are not affected by .lastIndex
or by flags. For example, .match()
ignores .lastIndex
if /g
is present:
function countMatches(regExp, str) {
if (!regExp.global) {
throw new Error('Flag /g of regExp must be set');
}return (str.match(regExp) ?? []).length;
}
const myRegExp = /a/g;
.lastIndex = 4;
myRegExp.equal(countMatches(myRegExp, 'babaa'), 3); // OK! assert
Here, countMatches()
works even though we didn’t check or fix .lastIndex
.
.lastIndex
: starting matching at a given indexApart from storing state, .lastIndex
can also be used to start matching at a given index. This section describes how.
Given that .test()
is affected by /y
and .lastIndex
, we can use it to check if a regular expression regExp
matches a string str
at a given index
:
function matchesStringAt(regExp, str, index) {
if (!regExp.sticky) {
throw new Error('Flag /y of regExp must be set');
}.lastIndex = index;
regExpreturn regExp.test(str);
}.equal(
assertmatchesStringAt(/x+/y, 'aaxxx', 0), false);
.equal(
assertmatchesStringAt(/x+/y, 'aaxxx', 2), true);
regExp
is anchored to .lastIndex
due to /y
.
Note that we must not use the assertion ^
which would anchor regExp
to the beginning of the input string.
.search()
lets us find the location where a regular expression matches:
> '#--#'.search(/#/)0
Alas, we can’t change where .search()
starts looking for matches. As a workaround, we can use .exec()
for searching:
function searchAt(regExp, str, index) {
if (!regExp.global && !regExp.sticky) {
throw new Error('Either flag /g or flag /y of regExp must be set');
}.lastIndex = index;
regExpconst match = regExp.exec(str);
if (match) {
return match.index;
else {
} return -1;
}
}
.equal(
assertsearchAt(/#/g, '#--#', 0), 0);
.equal(
assertsearchAt(/#/g, '#--#', 1), 3);
When used without /g
and with /y
, .replace()
makes one replacement – if there is a match at .lastIndex
:
function replaceOnceAt(str, regExp, replacement, index) {
if (!(regExp.sticky && !regExp.global)) {
throw new Error('Flag /y must be set, flag /g must not be set');
}.lastIndex = index;
regExpreturn str.replace(regExp, replacement);
}.equal(
assertreplaceOnceAt('aa aaaa a', /a+/y, 'X', 0), 'X aaaa a');
.equal(
assertreplaceOnceAt('aa aaaa a', /a+/y, 'X', 3), 'aa X a');
.equal(
assertreplaceOnceAt('aa aaaa a', /a+/y, 'X', 8), 'aa aaaa X');
.lastIndex
The regular expression property .lastIndex
has two significant downsides:
.lastIndex
is inconsistent among regular expression operations.On the upside, .lastIndex
also gives us additional useful functionality: We can dictate where matching should begin (for some operations).
.global
(/g
) and .sticky
(/y
)The following two methods are completely unaffected by /g
and /y
:
String.prototype.search()
String.prototype.split()
This table explains how the remaining regular-expression-related methods are affected by these two flags:
/ |
/g |
/y |
/yg |
|
---|---|---|---|---|
r.exec(s) |
{i:0} |
{i:1} |
{i:1} |
{i:1} |
.lI unch |
.lI upd |
.lI upd |
.lI upd |
|
r.test(s) |
true |
true |
true |
true |
.lI unch |
.lI upd |
.lI upd |
.lI upd |
|
s.match(r) |
{i:0} |
["#","#","#"] |
{i:1} |
["#","#"] |
.lI unch |
.lI reset |
.lI upd |
.lI reset |
|
s.matchAll(r) |
TypeError |
[{i:1}, {i:3}] |
TypeError |
[{i:1}] |
.lI unch |
.lI unch |
|||
s.replace(r, 'x') |
"x#-#" |
"xx-x" |
"#x-#" |
"xx-#" |
.lI unch |
.lI reset |
.lI upd |
.lI reset |
|
s.replaceAll(r, 'x') |
TypeError |
"xx-x" |
TypeError |
"xx-#" |
.lI reset |
.lI reset |
Variables:
const r = /#/; r.lastIndex = 1;
const s = '##-#';
Abbreviations:
{i:2}
: match object whose property .index
has the value 2
.lI
upd: .lastIndex
is updated.lI
reset: .lastIndex
is reset to zero.lI
unch: .lastIndex
is unchangedThe Node.js script that generated the previous table
The previous table was generated via a Node.js script.
The following function escapes an arbitrary text so that it is matched verbatim if we put it inside a regular expression:
function escapeForRegExp(str) {
return str.replace(/[\\^$.*+?()[\]{}|]/g, '\\$&'); // (A)
}.equal(escapeForRegExp('[yes?]'), String.raw`\[yes\?\]`);
assert.equal(escapeForRegExp('_g_'), String.raw`_g_`); assert
In line A, we escape all syntax characters. We have to be selective because the regular expression flag /u
forbids many escapes – for example: \a \: \-
escapeForRegExp()
has two use cases:
new RegExp()
..replace()
(and can’t use .replaceAll()
)..replace()
only lets us replace plain text once. With escapeForRegExp()
, we can work around that limitation:
const plainText = ':-)';
const regExp = new RegExp(escapeForRegExp(plainText), 'ug');
.equal(
assert':-) :-) :-)'.replace(regExp, '🙂'), '🙂 🙂 🙂');
Sometimes, we may need a regular expression that matches everything or nothing – for example, as a default value.
Match everything: /(?:)/
The empty group ()
matches everything. We make it non-capturing (via ?:
), to avoid unnecessary work.
> /(?:)/.test('')true
> /(?:)/.test('abc')true
Match nothing: /.^/
^
only matches at the beginning of a string. The dot moves matching beyond the first character and now ^
doesn’t match anymore.
> /.^/.test('')false
> /.^/.test('abc')false