JavaScript for impatient programmers (ES2021 edition)
Please support this book: buy it or donate
(Ad, please don’t block.)

43 Regular expressions (RegExp)



  Availability of features

Unless stated otherwise, each regular expression feature has been available since ES3.

43.1 Creating regular expressions

43.1.1 Literal vs. constructor

The two main ways of creating regular expressions are:

Both regular expressions have the same two parts:

43.1.2 Cloning and non-destructively modifying regular expressions

There are two variants of the constructor RegExp():

The second variant is useful for cloning regular expressions, optionally while modifying them. Flags are immutable and this is the only way of changing them – for example:

function copyAndAddFlags(regExp, flagsToAdd='') {
  // The constructor doesn’t allow duplicate flags;
  // make sure there aren’t any:
  const newFlags = [...new Set(regExp.flags + flagsToAdd)].join('');
  return new RegExp(regExp, newFlags);
}
assert.equal(/abc/i.flags, 'i');
assert.equal(copyAndAddFlags(/abc/i, 'g').flags, 'gi');

43.2 Syntax

43.2.1 Syntax characters

At the top level of a regular expression, the following syntax characters are special. They are escaped by prefixing a backslash (\).

\ ^ $ . * + ? ( ) [ ] { } |

In regular expression literals, we must escape slashs:

> /\//.test('/')
true

In the argument of new RegExp(), we don’t have to escape slashes:

> new RegExp('/').test('/')
true

43.2.2 Basic atoms

Atoms are the basic building blocks of regular expressions.

43.2.3 Unicode property escapes [ES2018]

43.2.3.1 Unicode character properties

In the Unicode standard, each character has properties – metadata describing it. Properties play an important role in defining the nature of a character. Quoting the Unicode Standard, Sect. 3.3, D3:

The semantics of a character are determined by its identity, normative properties, and behavior.

These are a few examples of properties:

43.2.3.2 Unicode property escapes

Unicode property escapes look like this:

  1. \p{prop=value}: matches all characters whose property prop has the value value.
  2. \P{prop=value}: matches all characters that do not have a property prop whose value is value.
  3. \p{bin_prop}: matches all characters whose binary property bin_prop is True.
  4. \P{bin_prop}: matches all characters whose binary property bin_prop is False.

Comments:

Examples:

Further reading:

43.2.4 Character classes

A character class wraps class ranges in square brackets. The class ranges specify a set of characters:

Rules for class ranges:

43.2.5 Groups

43.2.6 Quantifiers

By default, all of the following quantifiers are greedy (they match as many characters as possible):

To make them reluctant (so that they match as few characters as possible), put question marks (?) after them:

> /".*"/.exec('"abc"def"')[0]  // greedy
'"abc"def"'
> /".*?"/.exec('"abc"def"')[0] // reluctant
'"abc"'

43.2.7 Assertions

43.2.7.1 Lookahead assertions

Positive lookahead: (?=«pattern») matches if pattern matches what comes next.

Example: sequences of lowercase letters that are followed by an X.

> 'abcX def'.match(/[a-z]+(?=X)/g)
[ 'abc' ]

Note that the X itself is not part of the matched substring.

Negative lookahead: (?!«pattern») matches if pattern does not match what comes next.

Example: sequences of lowercase letters that are not followed by an X.

> 'abcX def'.match(/[a-z]+(?!X)/g)
[ 'ab', 'def' ]
43.2.7.2 Lookbehind assertions [ES2018]

Positive lookbehind: (?<=«pattern») matches if pattern matches what came before.

Example: sequences of lowercase letters that are preceded by an X.

> 'Xabc def'.match(/(?<=X)[a-z]+/g)
[ 'abc' ]

Negative lookbehind: (?<!«pattern») matches if pattern does not match what came before.

Example: sequences of lowercase letters that are not preceded by an X.

> 'Xabc def'.match(/(?<!X)[a-z]+/g)
[ 'bc', 'def' ]

Example: replace “.js” with “.html”, but not in “Node.js”.

> 'Node.js: index.js and main.js'.replace(/(?<!Node)\.js/g, '.html')
'Node.js: index.html and main.html'

43.2.8 Disjunction (|)

Caveat: this operator has low precedence. Use groups if necessary:

43.3 Flags

Table 21: These are the regular expression flags supported by JavaScript.
Literal flag Property name ES Description
g global ES3 Match multiple times
i ignoreCase ES3 Match case-insensitively
m multiline ES3 ^ and $ match per line
s dotAll ES2018 Dot matches line terminators
u unicode ES6 Unicode mode (recommended)
y sticky ES6 No characters between matches

The following regular expression flags are available in JavaScript (tbl. 21 provides a compact overview):

43.3.1 Flag: Unicode mode via /u

The flag /u switches on a special Unicode mode for regular expressions. That mode enables several features:

The following subsections explain the last item in more detail. They use the following Unicode character to explain when the atomic units are Unicode characters and when they are JavaScript characters:

const codePoint = '🙂';
const codeUnits = '\uD83D\uDE42'; // UTF-16

assert.equal(codePoint, codeUnits); // same string!

I’m only switching between 🙂 and \uD83D\uDE42, to illustrate how JavaScript sees things. Both are equivalent and can be used interchangeably in strings and regular expressions.

43.3.1.1 Consequence: we can put Unicode characters in character classes

With /u, the two code units of 🙂 are treated as a single character:

> /^[🙂]$/u.test('🙂')
true

Without /u, 🙂 is treated as two characters:

> /^[\uD83D\uDE42]$/.test('\uD83D\uDE42')
false
> /^[\uD83D\uDE42]$/.test('\uDE42')
true

Note that ^ and $ demand that the input string have a single character. That’s why the first result is false.

43.3.1.2 Consequence: the dot operator (.) matches Unicode characters, not JavaScript characters

With /u, the dot operator matches Unicode characters:

> '🙂'.match(/./gu).length
1

.match() plus /g returns an Array with all the matches of a regular expression.

Without /u, the dot operator matches JavaScript characters:

> '\uD83D\uDE80'.match(/./g).length
2
43.3.1.3 Consequence: quantifiers apply to Unicode characters, not JavaScript characters

With /u, a quantifier applies to the whole preceding Unicode character:

> /^🙂{3}$/u.test('🙂🙂🙂')
true

Without /u, a quantifier only applies to the preceding JavaScript character:

> /^\uD83D\uDE80{3}$/.test('\uD83D\uDE80\uDE80\uDE80')
true

43.4 Properties of regular expression objects

Noteworthy:

43.4.1 Flags as properties

Each regular expression flag exists as a property with a longer, more descriptive name:

> /a/i.ignoreCase
true
> /a/.ignoreCase
false

This is the complete list of flag properties:

43.4.2 Other properties

Each regular expression also has the following properties:

43.5 Methods for working with regular expressions

43.5.1 By default, regular expressions match anywhere in a string

By default, regular expressions match anywhere in a string:

> /a/.test('__a__')
true

We can change that by using assertions such as ^ or by using the flag /y:

> /^a/.test('__a__')
false
> /^a/.test('a__')
true

43.5.2 regExp.test(str): is there a match? [ES3]

The regular expression method .test() returns true if regExp matches str:

> /bc/.test('ABCD')
false
> /bc/i.test('ABCD')
true
> /\.mjs$/.test('main.mjs')
true

With .test() we should normally avoid the /g flag. If we use it, we generally don’t get the same result every time we call the method:

> const r = /a/g;
> r.test('aab')
true
> r.test('aab')
true
> r.test('aab')
false

The results are due to /a/ having two matches in the string. After all of those were found, .test() returns false.

43.5.3 str.search(regExp): at what index is the match? [ES3]

The string method .search() returns the first index of str at which there is a match for regExp:

> '_abc_'.search(/abc/)
1
> 'main.mjs'.search(/\.mjs$/)
4

43.5.4 regExp.exec(str): capturing groups [ES3]

43.5.4.1 Getting a match object for the first match

Without the flag /g, .exec() returns the captures of the first match for regExp in str:

assert.deepEqual(
  /(a+)b/.exec('ab aab'),
  {
    0: 'ab',
    1: 'a',
    index: 0,
    input: 'ab aab',
    groups: undefined,
  }
);

The result is a match object with the following properties:

43.5.4.2 Named capture groups [ES2018]

The previous example contained a single positional group. The following example demonstrates named groups:

assert.deepEqual(
  /(?<as>a+)b/.exec('ab aab'),
  {
    0: 'ab',
    1: 'a',
    index: 0,
    input: 'ab aab',
    groups: { as: 'a' },
  }
);

In the result of .exec(), we can see that a named group is also a positional group – its capture exists twice:

43.5.4.3 Looping over all matches

  Better alternative for retrieving all matches: str.matchAll(regExp) [ES2020]

Since ECMAScript 2020, JavaScript has another method for retrieving all matches: str.matchAll(regExp). This method is easier to use and has fewer caveats.

If we want to retrieve all matches of a regular expression (not just the first one), we need to switch on the flag /g. Then we can call .exec() multiple times and get one match each time. After the last match, .exec() returns null.

> const regExp = /(a+)b/g;
> regExp.exec('ab aab')
{ 0: 'ab', 1: 'a', index: 0, input: 'ab aab', groups: undefined }
> regExp.exec('ab aab')
{ 0: 'aab', 1: 'aa', index: 3, input: 'ab aab', groups: undefined }
> regExp.exec('ab aab')
null

Therefore, we can loop over all matches as follows:

const regExp = /(a+)b/g;
const str = 'ab aab';

let match;
// Check for null via truthiness
// Alternative: while ((match = regExp.exec(str)) !== null)
while (match = regExp.exec(str)) {
  console.log(match[1]);
}
// Output:
// 'a'
// 'aa'

  Be careful when sharing regular expressions with /g!

Sharing regular expressions with /g has a few pitfalls, which are explained later.

  Exercise: Extracting quoted text via .exec()

exercises/regexps/extract_quoted_test.mjs

43.5.5 str.match(regExp): getting all group 0 captures [ES3]

Without /g, .match() works like .exec() – it returns a single match object.

With /g, .match() returns all substrings of str that match regExp:

> 'ab aab'.match(/(a+)b/g)
[ 'ab', 'aab' ]

If there is no match, .match() returns null:

> 'xyz'.match(/(a+)b/g)
null

We can use the nullish coalescing operator (??) to protect ourselves against null:

const numberOfMatches = (str.match(regExp) ?? []).length;

43.5.6 str.matchAll(regExp): getting an iterable over all match objects [ES2020]

This is how .matchAll() is invoked:

const matchIterable = str.matchAll(regExp);

Given a string and a regular expression, .matchAll() returns an iterable over the match objects of all matches.

We can also use the spread operator (...) to convert the iterable to an Array:

> [...'-a-a-a'.matchAll(/-(a)/ug)]
[
  { 0:'-a', 1:'a', index: 0, input: '-a-a-a', groups: undefined },
  { 0:'-a', 1:'a', index: 2, input: '-a-a-a', groups: undefined },
  { 0:'-a', 1:'a', index: 4, input: '-a-a-a', groups: undefined },
]

Flag /g must be set:

> [...'-a-a-a'.matchAll(/-(a)/u)]
TypeError: String.prototype.matchAll called with a non-global
RegExp argument

.matchAll() isn’t affected by regExp.lastIndex and doesn’t change it.

43.5.6.1 Implementing .matchAll()

.matchAll() could be implemented via .exec() as follows:

function* matchAll(str, regExp) {
  if (!regExp.global) {
    throw new TypeError('Flag /g must be set!');
  }
  const localCopy = new RegExp(regExp, regExp.flags);
  let match;
  while (match = localCopy.exec(str)) {
    yield match;
  }
}

Making a local copy ensures two things:

Using matchAll():

const str = '"fee" "fi" "fo" "fum"';
const regex = /"([^"]*)"/g;

for (const match of matchAll(str, regex)) {
  console.log(match[1]);
}
// Output:
// 'fee'
// 'fi'
// 'fo'
// 'fum'

43.5.7 regExp.exec() vs. str.match() vs. str.matchAll()

The following table summarizes the differences between three methods:

Without /g With /g
regExp.exec(str) First match object Next match object or null
str.match(regExp) First match object Array of group 0 captures
str.matchAll(regExp) TypeError Iterable over match objects

43.5.8 Replacing with str.replace() and str.replaceAll()

Both replacing methods have two parameters:

searchValue can be:

replacementValue can be:

The two methods differ as follows:

This table summarizes how that works:

Search for: string RegExp w/o \g RegExp with /g
.replace First occurrence First occurrence (All occurrences)
.replaceAll All occurrences TypeError All occurrences

The last column of .replace() is in parentheses because this method existed long before .replaceAll() and therefore supports functionality that now should be handled via the latter method. If we could change that, .replace() would throw a TypeError here.

We first explore how .replace() and .replaceAll() work individually when replacementValue is a simple string (without the character $). Then we examine how both are affected by more complicated replacement values.

43.5.8.1 str.replace(searchValue, replacementValue) [ES3]

How .replace() operates is influenced by its first parameter searchValue:

If we want to replace every occurrence of a string, we have two options:

43.5.8.2 str.replaceAll(searchValue, replacementValue) [ES2021]

How .replaceAll() operates is influenced by its first parameter searchValue:

43.5.8.3 The parameter replacementValue of .replace() and .replaceAll()

So far, we have only used the parameter replacementValue with simple strings, but it can do more. If its value is:

43.5.8.4 replacementValue is a string

If the replacement value is a string, the dollar sign has special meaning – it inserts text matched by the regular expression:

Text Result
$$ single $
$& complete match
$` text before match
$' text after match
$n capture of positional group n (n > 0)
$<name> capture of named group name [ES2018]

Example: Inserting the text before, inside, and after the matched substring.

> 'a1 a2'.replaceAll(/a/g, "($`|$&|$')")
'(|a|1 a2)1 (a1 |a|2)2'

Example: Inserting the captures of positional groups.

> const regExp = /^([A-Za-z]+): (.*)$/ug;
> 'first: Jane'.replaceAll(regExp, 'KEY: $1, VALUE: $2')
'KEY: first, VALUE: Jane'

Example: Inserting the captures of named groups.

> const regExp = /^(?<key>[A-Za-z]+): (?<value>.*)$/ug;
> 'first: Jane'.replaceAll(regExp, 'KEY: $<key>, VALUE: $<value>')
'KEY: first, VALUE: Jane'

  Exercise: Change quotes via .replace() and a named group

exercises/regexps/change_quotes_test.mjs

43.5.8.5 replacementValue is a function

If the replacement value is a function, we can compute each replacement. In the following example, we multiply each non-negative integer that we find by two.

assert.equal(
  '3 cats and 4 dogs'.replaceAll(/[0-9]+/g, (all) => 2 * Number(all)),
  '6 cats and 8 dogs'
);

The replacement function gets the following parameters. Note how similar they are to match objects. These parameters are all positional, but I’ve included how one might name them:

43.5.9 Other methods for working with regular expressions

String.prototype.split() is described in the chapter on strings. Its first parameter of String.prototype.split() is either a string or a regular expression. If it is the latter, then captures of groups appear in the result:

> 'a:b : c'.split(':')
[ 'a', 'b ', ' c' ]
> 'a:b : c'.split(/ *: */)
[ 'a', 'b', 'c' ]
> 'a:b : c'.split(/( *):( *)/)
[ 'a', '', '', 'b', ' ', ' ', 'c' ]

43.6 The flags /g and /y, and the property .lastIndex (advanced)

In this section, we examine how the RegExp flags /g and /y work and how they depend on the RegExp property .lastIndex. We’ll also discover an interesting use case for .lastIndex that you may find surprising.

43.6.1 The flags /g and /y

Every method reacts differently to /g and /y; this gives us a rough general idea:

If a regular expression has neither the flag /g nor the flag /y, matching happens once and starts at the beginning.

With either /g or /y, matching is performed relative to a “current position” inside the input string. That position is stored in the regular expression property .lastIndex.

There are three groups of regular-expression-related methods:

  1. The string methods .search(regExp) and .split(regExp) completely ignore /g and /y (and therefore also .lastIndex).

  2. The RegExp methods .exec(str) and .test(str) change in two ways if either /g or /y is set.

    First, we get multiple matches, by calling one method repeatedly. Each time, it returns either another result (a match object or true) or an “end of results” value (null or false).

    Second, the regular expression property .lastIndex is used to step through the input string. On one hand, .lastIndex determines where matching starts:

    • /g means that a match must begin at .lastIndex or later.

    • /y means that a match must begin at .lastIndex. That is, the beginning of the regular expression is anchored to .lastIndex.

      Note that ^ and $ continue to work as usually: They anchor matches to the beginning or end of the input string, unless .multiline is set. Then they anchor to the beginnings or ends of lines.

    On the other hand, .lastIndex is set to one plus the last index of the previous match.

  3. All other methods are affected as follows:

    • /g leads to multiple matches.
    • /y leads to a single match that must start at .lastIndex.
    • /yg leads to multiple matches without gaps.

This was a first overview. The next sections get into more details.

43.6.2 How exactly are methods affected by /g and /y?

43.6.2.1 regExp.exec(str) [ES3]

Without /g and /y, .exec() ignores .lastIndex and always returns a match object for the first match:

> const re = /#/; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]

With /g, the match must start at .lastIndex or later. .lastIndex is updated. If there is no match, null is returned.

> const re = /#/g; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 1, input: '##-#' }, 2]
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 3, input: '##-#' }, 4]
> [re.exec('##-#'), re.lastIndex]
[null, 0]

With /y, the match must start at exactly .lastIndex. .lastIndex is updated. If there is no match, null is returned.

> const re = /#/y; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 1, input: '##-#' }, 2]
> [re.exec('##-#'), re.lastIndex]
[null, 0]

With /yg, .exec() behaves the same as with /y.

43.6.2.2 regExp.test(str) [ES3]

This method behaves the same same as .exec(), but instead of returning a match object, it returns true, and instead of returning null, it returns false.

For example, without either /g or /y, the result is always true:

> const re = /#/; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex]
[true, 1]
> [re.test('##-#'), re.lastIndex]
[true, 1]

With /g, there are two matches:

> const re = /#/g; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex]
[true, 2]
> [re.test('##-#'), re.lastIndex]
[true, 4]
> [re.test('##-#'), re.lastIndex]
[false, 0]

With /y, there is only one match:

> const re = /#/y; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex]
[true, 2]
> [re.test('##-#'), re.lastIndex]
[false, 0]

With /yg, .test() behaves the same as with /y.

43.6.2.3 str.match(regExp) [ES3]

Without /g, .match() works like .exec(). Either without /y:

> const re = /#/; re.lastIndex = 1;
> ['##-#'.match(re), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]
> ['##-#'.match(re), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]

Or with /y:

> const re = /#/y; re.lastIndex = 1;
> ['##-#'.match(re), re.lastIndex]
[{ 0: '#', index: 1, input: '##-#' }, 2]
> ['##-#'.match(re), re.lastIndex]
[null, 0]

With /g, we get all matches (group 0) in an Array. .lastIndex is ignored and reset to zero.

> const re = /#/g; re.lastIndex = 1;
> '##-#'.match(re)
['#', '#', '#']
> re.lastIndex
0

/yg works like /g, but there are no gaps between matches:

> const re = /#/yg; re.lastIndex = 1;
> '##-#'.match(re)
['#', '#']
> re.lastIndex
0
43.6.2.4 str.matchAll(regExp) [ES2020]

If /g is not set, .matchAll() throws an exception:

> const re = /#/y; re.lastIndex = 1;
> '##-#'.matchAll(re)
TypeError: String.prototype.matchAll called with
a non-global RegExp argument

If /g is set, matching starts at .lastIndex and that property isn’t changed:

> const re = /#/g; re.lastIndex = 1;
> [...'##-#'.matchAll(re)]
[
  { 0: '#', index: 1, input: '##-#' },
  { 0: '#', index: 3, input: '##-#' },
]
> re.lastIndex
1

If /yg is set, the behavior is the same as with /g, but there are no gaps between matches:

> const re = /#/yg; re.lastIndex = 1;
> [...'##-#'.matchAll(re)]
[
  { 0: '#', index: 1, input: '##-#' },
]
> re.lastIndex
1
43.6.2.5 str.replace(regExp, str) [ES3]

Without /g and /y, only the first occurrence is replaced:

> const re = /#/; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'x#-#'
> re.lastIndex
1

With /g, all occurrences are replaced. .lastIndex is ignored but reset to zero.

> const re = /#/g; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'xx-x'
> re.lastIndex
0

With /y, only the (first) occurrence at .lastIndex is replaced. .lastIndex is updated.

> const re = /#/y; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'#x-#'
> re.lastIndex
2

/yg works like /g, but gaps between matches are not allowed:

> const re = /#/yg; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'xx-#'
> re.lastIndex
0
43.6.2.6 str.replaceAll(regExp, str) [ES2021]

.replaceAll() works like .replace() but throws an exception if /g is not set:

> const re = /#/y; re.lastIndex = 1;
> '##-#'.replaceAll(re, 'x')
TypeError: String.prototype.replaceAll called
with a non-global RegExp argument

43.6.3 Four pitfalls of /g and /y and how to deal with them

We will first look at four pitfalls of /g and /y and then at ways of dealing with those pitfalls.

43.6.3.1 Pitfall 1: We can’t inline a regular expression with /g or /y

A regular expression with /g can’t be inlined. For example, in the following while loop, the regular expression is created fresh, every time the condition is checked. Therefore, its .lastIndex is always zero and the loop never terminates.

let matchObj;
// Infinite loop
while (matchObj = /a+/g.exec('bbbaabaaa')) {
  console.log(matchObj[0]);
}

With /y, the problem is the same.

43.6.3.2 Pitfall 2: Removing /g or /y can break code

If code expects a regular expression with /g and has a loop over the results of .exec() or .test(), then a regular expression without /g can cause an infinite loop:

function collectMatches(regExp, str) {
  const matches = [];
  let matchObj;
  // Infinite loop
  while (matchObj = regExp.exec(str)) {
    matches.push(matchObj[0]);
  }
  return matches;
}
collectMatches(/a+/, 'bbbaabaaa'); // Missing: flag /g

Why is there an infinity loop? Because .exec() always returns the first result, a match object, and never null.

With /y, the problem is the same.

43.6.3.3 Pitfall 3: Adding /g or /y can break code

With .test(), there is another caveat: It is affected by .lastIndex. Therefore, if we want to check exactly once if a regular expression matches a string, then the regular expression must not have /g. Otherwise, we generally get a different result every time we call .test():

> const regExp = /^X/g;
> [regExp.test('Xa'), regExp.lastIndex]
[ true, 1 ]
> [regExp.test('Xa'), regExp.lastIndex]
[ false, 0 ]
> [regExp.test('Xa'), regExp.lastIndex]
[ true, 1 ]

The first invocation produces a match and updates .lastIndex. The second invocation does not find a match and resets .lastIndex to zero.

If we create a regular expression specifically for .test(), then we probably won’t add /g. However, the likeliness of encountering /g increases if we use the same regular expression for replacing and for testing.

Once again, this problem also exists with /y:

> const regExp = /^X/y;
> regExp.test('Xa')
true
> regExp.test('Xa')
false
> regExp.test('Xa')
true
43.6.3.4 Pitfall 4: Code can produce unexpected results if .lastIndex isn’t zero

Given all the regular expression operations that are affected by .lastIndex, we must be careful with many algorithms that .lastIndex is zero at the beginning. Otherwise, we may get unexpected results:

function countMatches(regExp, str) {
  let count = 0;
  while (regExp.test(str)) {
    count++;
  }
  return count;
}

const myRegExp = /a/g;
myRegExp.lastIndex = 4;
assert.equal(
  countMatches(myRegExp, 'babaa'), 1); // should be 3

Normally, .lastIndex is zero in newly created regular expressions and we won’t change it explicitly like we did in the example. But .lastIndex can still end up not being zero if we use the regular expression multiple times.

43.6.3.5 How to avoid the pitfalls of /g and /y

As an example of dealing with /g and .lastIndex, we revisit countMatches() from the previous example. How do we prevent a wrong regular expression from breaking our code? Let’s look at three approaches.

43.6.3.5.1 Throwing exceptions

First, we can throw an exception if /g isn’t set or .lastIndex isn’t zero:

function countMatches(regExp, str) {
  if (!regExp.global) {
    throw new Error('Flag /g of regExp must be set');
  }
  if (regExp.lastIndex !== 0) {
    throw new Error('regExp.lastIndex must be zero');
  }
  
  let count = 0;
  while (regExp.test(str)) {
    count++;
  }
  return count;
}
43.6.3.5.2 Cloning regular expressions

Second, we can clone the parameter. That has the added benefit that regExp won’t be changed.

function countMatches(regExp, str) {
  const cloneFlags = regExp.flags + (regExp.global ? '' : 'g');
  const clone = new RegExp(regExp, cloneFlags);

  let count = 0;
  while (clone.test(str)) {
    count++;
  }
  return count;
}
43.6.3.5.3 Using an operation that isn’t affected by .lastIndex or flags

Several regular expression operations are not affected by .lastIndex or by flags. For example, .match() ignores .lastIndex if /g is present:

function countMatches(regExp, str) {
  if (!regExp.global) {
    throw new Error('Flag /g of regExp must be set');
  }
  return (str.match(regExp) ?? []).length;
}

const myRegExp = /a/g;
myRegExp.lastIndex = 4;
assert.equal(countMatches(myRegExp, 'babaa'), 3); // OK!

Here, countMatches() works even though we didn’t check or fix .lastIndex.

43.6.4 Use case for .lastIndex: starting matching at a given index

Apart from storing state, .lastIndex can also be used to start matching at a given index. This section describes how.

43.6.4.1 Example: Checking if a regular expression matches at a given index

Given that .test() is affected by /y and .lastIndex, we can use it to check if a regular expression regExp matches a string str at a given index:

function matchesStringAt(regExp, str, index) {
  if (!regExp.sticky) {
    throw new Error('Flag /y of regExp must be set');
  }
  regExp.lastIndex = index;
  return regExp.test(str);
}
assert.equal(
  matchesStringAt(/x+/y, 'aaxxx', 0), false);
assert.equal(
  matchesStringAt(/x+/y, 'aaxxx', 2), true);

regExp is anchored to .lastIndex due to /y.

Note that we must not use the assertion ^ which would anchor regExp to the beginning of the input string.

43.6.4.2 Example: Finding the location of a match, starting at a given index

.search() lets us find the location where a regular expression matches:

> '#--#'.search(/#/)
0

Alas, we can’t change where .search() starts looking for matches. As a work-around, we can use .exec() for searching:

function searchAt(regExp, str, index) {
  if (!regExp.global && !regExp.sticky) {
    throw new Error('Either flag /g or flag /y of regExp must be set');
  }
  regExp.lastIndex = index;
  const match = regExp.exec(str);
  if (match) {
    return match.index;
  } else {
    return -1;
  }
}

assert.equal(
  searchAt(/#/g, '#--#', 0), 0);
assert.equal(
  searchAt(/#/g, '#--#', 1), 3);
43.6.4.3 Example: Replacing an occurrence at a given index

When used without /g and with /y, .replace() makes one replacement – if there is a match at .lastIndex:

function replaceOnceAt(str, regExp, replacement, index) {
  if (!(regExp.sticky && !regExp.global)) {
    throw new Error('Flag /y must be set, flag /g must not be set');
  }
  regExp.lastIndex = index;
  return str.replace(regExp, replacement);
}
assert.equal(
  replaceOnceAt('aa aaaa a', /a+/y, 'X', 0), 'X aaaa a');
assert.equal(
  replaceOnceAt('aa aaaa a', /a+/y, 'X', 3), 'aa X a');
assert.equal(
  replaceOnceAt('aa aaaa a', /a+/y, 'X', 8), 'aa aaaa X');

43.6.5 The downsides of .lastIndex

The regular expression property .lastIndex has two significant downsides:

On the upside, .lastIndex also gives us additional useful functionality: We can dictate where matching should begin (for some operations).

43.6.6 Summary: .global (/g) and .sticky (/y)

The following two methods are completely unaffected by /g and /y:

This table explains how the remaining regular-expression-related methods are affected by these two flags:

/ /g /y /yg
r.exec(s) {i:0} {i:1} {i:1} {i:1}
.lI unch .lI upd .lI upd .lI upd
r.test(s) true true true true
.lI unch .lI upd .lI upd .lI upd
s.match(r) {i:0} ["#","#","#"] {i:1} ["#","#"]
.lI unch .lI reset .lI upd .lI reset
s.matchAll(r) TypeError [{i:1}, {i:3}] TypeError [{i:1}]
.lI unch .lI unch
s.replace(r, 'x') "x#-#" "xx-x" "#x-#" "xx-#"
.lI unch .lI reset .lI upd .lI reset
s.replaceAll(r, 'x') TypeError "xx-x" TypeError "xx-#"
.lI reset .lI reset

Variables:

const r = /#/; r.lastIndex = 1;
const s = '##-#';

Abbreviations:

  The Node.js script that generated the previous table

The previous table was generated via a Node.js script.

43.7 Techniques for working with regular expressions

43.7.1 Escaping arbitrary text for regular expressions

The following function escapes an arbitrary text so that it is matched verbatim if we put it inside a regular expression:

function escapeForRegExp(str) {
  return str.replace(/[\\^$.*+?()[\]{}|]/g, '\\$&'); // (A)
}
assert.equal(escapeForRegExp('[yes?]'), String.raw`\[yes\?\]`);
assert.equal(escapeForRegExp('_g_'), String.raw`_g_`);

In line A, we escape all syntax characters. We have to be selective because the regular expression flag /u forbids many escapes – for example: \a \: \-

escapeForRegExp() has two use cases:

.replace() only lets us replace plain text once. With escapeForRegExp(), we can work around that limitation:

const plainText = ':-)';
const regExp = new RegExp(escapeForRegExp(plainText), 'ug');
assert.equal(
  ':-) :-) :-)'.replace(regExp, '🙂'), '🙂 🙂 🙂');

43.7.2 Matching everything or nothing

Sometimes, we may need a regular expression that matches everything or nothing – for example, as a default value.