Regular expressions (RegExp) • Exploring JavaScript (ES2025 Edition)

46 Regular expressions (`RegExp`)

Availability of features

Unless stated otherwise, each regular expression feature has been available since ES3.

46.1 Creating regular expressions

46.1.1 Literal vs. constructor

The two main ways of creating regular expressions are:

Literal: compiled statically (at load time).
```
/abc/iv
```
Constructor: compiled dynamically (at runtime).
```
new RegExp('abc', 'iv')
```

Both regular expressions have the same two parts:

The body abc – the actual regular expression.
The flags i and v. Flags configure how the pattern is interpreted. For example, i enables case-insensitive matching. A list of available flags is given later in this chapter.

Recommended: flag /v ^ES2024

Flag /v enables important features and is recommended for all regular expressions.

46.1.2 Tip: Use `String.raw` literals with `new RegExp()`

If we use a normal string literal as an argument for new RegExp(), every regular expression backslash has to be escaped. If we use String.raw as a template tag then that’s not necessary. The following three regular expressions are all equivalent:

> /^\*$/.test('*')
true
> new RegExp('^\\*$', 'v').test('*')
true
> new RegExp(String.raw`^\*$`, 'v').test('*')
true

If we use String.raw often, we can abbreviate it:

// “Import” from namespace `String`
const {raw} = String;

const regExp = new RegExp(raw`^\*$`, 'v');

46.1.3 Cloning and non-destructively modifying regular expressions

There are two variants of the constructor RegExp():

new RegExp(pattern : string, flags = '') ^ES3

A new regular expression is created as specified via pattern. If flags is missing, the empty string '' is used.
new RegExp(regExp : RegExp, flags = regExp.flags) ^ES6

regExp is cloned. If flags is provided, then it determines the flags of the clone.

The second variant is useful for cloning regular expressions, optionally while modifying them. Flags are immutable and this is the only way of changing them – for example:

function copyAndAddFlags(regExp, flagsToAdd='') {
  // The constructor doesn’t allow duplicate flags;
  // make sure there aren’t any:
  const newFlags = Array.from(
    new Set(regExp.flags + flagsToAdd)
  ).join('');
  return new RegExp(regExp, newFlags);
}
assert.equal(/abc/i.flags, 'i');
assert.equal(copyAndAddFlags(/abc/i, 'g').flags, 'gi');

new Set() iterates over its argument which means that a string is split into code points. Those code points become Set elements, which eliminates duplicates. Before we can join them into a string, we have to convert them to an Array and do so via Array.from().

46.2 Syntax characters and escaping

46.2.1 Syntax characters

At the top level of a regular expression, the following Syntax Characters are special. They are escaped by prefixing a backslash (\).

^ $ \ . * + ? ( ) [ ] { } |

This is an example:

> /\*/v.test('*')
true

In regular expression literals, we must escape slashes:

> /\//v.test('/')
true

In the argument of new RegExp(), we don’t have to escape slashes:

> new RegExp('/', 'v').test('/')
true

46.2.2 Illegal top-level escaping

With a Unicode flag (/u or /v), escaping a non-syntax character at the top level is a syntax error. That enables syntax such as \p{···}.

assert.throws(
  () => eval(String.raw`/\a/v`),
  {
    name: 'SyntaxError',
    message: 'Invalid regular expression: /\\a/v: Invalid escape',
  }
);
assert.throws(
  () => eval(String.raw`/\-/v`),
  {
    name: 'SyntaxError',
    message: 'Invalid regular expression: /\\-/v: Invalid escape',
  }
);

Without flag /u and /v, an escaped non-syntax character at the top level matches itself:

> /\a/.test('a')
true

46.2.3 Escaping inside character classes (`[···]`)

The rules for escaping inside character classes are different if we use the recommended flag /v. We look at those first before we look at the rules for regular expressions that have no Unicode flag (neither /u nor /v).

46.2.3.1 Escaping inside character classes: flag `/v`

The following characters can be escaped via backslash:

Syntax Characters:
```
^ $ \ . * + ? ( ) [ ] { } |
```
Slash:
```
/
```
Class Set Reserved Punctuators:
```
& - ! # % , : ; < = > @ ` ~
```

Interestingly, we don’t always have to escape these characters. Only the following sequences of characters don’t match themselves and have to be escaped:

A single ^ only has to be escaped if it comes first.
Class Set Syntax Characters must always be escaped:
```
( ) [ ] { } / - \ |
```
Class Set Reserved Double Punctuators must always be escaped (at least one of them):
```
&& !! ## $$ %% ** ++ ,, .. :: ;; << == >> ?? @@ ^^ `` ~~
```

46.2.3.2 Escaping inside character classes: no Unicode flag (neither `/u` nor `/v`)

We always must escape: \ ]
Some characters only have to be escaped in some locations:
- ^ only has to be escaped if it comes first.
- - only has to be escaped if it doesn’t come first or last.

46.3 Regular expressions: the concept of a character

In the context of regular expressions, “character” means “atomic unit of text”:

Without the Unicode flags (/u and /v), a character is a JavaScript character (a UTF-16 code unit).
With a Unicode flag, a character is a code point.

For example, a dot (.) matches a single character:

> '🙂'.match(/./g) // code point with 2 code units
[ '\uD83D', '\uDE42' ]
> '🙂'.match(/./gv)
[ '🙂' ]

46.4 Syntax: matching single characters

These constructs match single characters:

Pattern characters are all characters except syntax characters (^, $, etc.). Pattern characters match themselves. Examples: A b % -
. matches any character. We can use flag /s (dotAll) to control if the dot matches line terminators or not.
Character escapes (each escape matches a single fixed character):
- Control escapes (for a few control characters):
  - \f: form feed (FF)
  - \n: line feed (LF)
  - \r: carriage return (CR)
  - \t: character tabulation
  - \v: line tabulation
- Control characters: \cA (Ctrl-A), …, \cZ (Ctrl-Z)
- Hex escape (first 256 Unicode code points): \x20 (space)
- Unicode code unit escapes: \u00E4 (ä)
- Unicode code point escapes (require Unicode flag /u or /v): \u{1F642} (🙂)
- Identity escapes match the escaped characters (these are the rules with a Unicode flag /u or /v; without them, most characters can be identity-escaped):
  - We can escape the following syntax characters by prefixing them with backslashes:
```
^ $ \ . * + ? ( ) [ ] { } |
```
  - We can also escape a slash like this: \/

46.5 Syntax: character class escapes

Character class escapes match sets of code units, sets of code points or sets of code point sequences.

46.5.1 Basic character class escapes (sets of code units): `\d \D \s \S \w \W`

The following character class escapes and their complements are always supported:

	Escape	Equivalent	Complement
Digits	`\d`	`[0-9]`	`\D`
“Word” characters	`\w`	`[a-zA-Z0-9_]`	`\W`
Whitespace	`\s`		`\S`

Note:

Whitespace: \s matches all whitespace code points: space, tab, line terminators, etc. They all fit into single UTF-16 code units.
“Word” characters are related to identifiers in programming languages.

Examples:

> 'a7x4'.match(/\d/g)
[ '7', '4' ]
> 'a7x4'.match(/\D/g)
[ 'a', 'x' ]
> 'high - low'.match(/\w+/g)
[ 'high', 'low' ]
> 'hello\t\n everyone'.replaceAll(/\s/g, '-')
'hello---everyone'

46.5.2 Unicode property escapes: `\p{}` and `\P{}`

Unicode property escapes look like this:

Positive escape: \p{UP} matches Unicode characters or Unicode strings that have the Unicode property UP.
Negative escape: \P{UP} matches Unicode characters that do not have the Unicode property UP.

There are two kinds of Unicode properties:

Unicode character properties are properties of code points. They specify sets of code points.
- Example: White_Space
- Supported by flag /u and flag /v
- Introduced in ES2018
Unicode string properties are properties of sequences of code points. They specify sets of code point strings. String property escapes can only be positive.
- Example: RGI_Emoji
- Only supported by flag /v
- Introduced in ES2024

A quick example before we look at everything in more detail. This is how we can escape the Unicode character property White_Space:

// Match all code points that are whitespace
assert.deepEqual(
  'a\tb c'.match(/\p{White_Space}/gv),
  ['\t', ' ']
);

// Match all code points that are not whitespace
assert.deepEqual(
  'a\tb c'.match(/\P{White_Space}/gv),
  ['a', 'b', 'c']
);

46.5.3 Unicode character property escapes ^ES2018

With flag /u or flag /v, we can use \p{} and \P{} to specify sets of code points via Unicode character properties (we’ll learn more about those in the next subsection). That looks like this:

\p{prop=value}: matches all characters whose Unicode character property prop has the value value.
\P{prop=value}: matches all characters that do not have a Unicode character property prop whose value is value.
\p{bin_prop}: matches all characters whose binary Unicode character property bin_prop is True.
\P{bin_prop}: matches all characters whose binary Unicode character property bin_prop is False.

Comments:

Forms (3) and (4) can be used as abbreviations if the property is General_Category. For example, the following two escapes are equivalent:
```
\p{Uppercase_Letter}
\p{General_Category=Uppercase_Letter}
```
Without the flags /u and /v, \p is the same as p.

Examples:

Checking for whitespace:

> /^\p{White_Space}+$/v.test('\t \n\r')
true

Checking for Greek letters:

> /^\p{Script=Greek}+$/v.test('μετά')
true

Deleting any letters:

> '1π2ü3é4'.replace(/\p{Letter}/gv, '')
'1234'

Deleting lowercase letters:

> 'AbCdEf'.replace(/\p{Lowercase_Letter}/gv, '')
'ACE'

46.5.3.1 Unicode character properties

In the Unicode standard, each character has properties – metadata describing it. Properties play an important role in defining the nature of a character. Quoting the Unicode Standard, Sect. 3.3, D3:

The semantics of a character are determined by its identity, normative properties, and behavior.

These are a few examples of properties:

Name: a unique name, composed of uppercase letters, digits, hyphens, and spaces – for example:
- A: Name = LATIN CAPITAL LETTER A
- 🙂: Name = SLIGHTLY SMILING FACE
General_Category: categorizes characters – for example:
- x: General_Category = Lowercase_Letter
- 3: General_Category = Number
- $: General_Category = Currency_Symbol
White_Space: used for marking invisible spacing characters, such as spaces, tabs and newlines – for example:
- \t: White_Space = True
- π: White_Space = False
Age: version of the Unicode Standard in which a character was introduced – for example: The Euro sign € was added in version 2.1 of the Unicode standard.
- €: Age = 2.1
Block: a contiguous range of code points. Blocks don’t overlap and their names are unique. For example:
- S: Block = Basic_Latin (range 0x0000..0x007F)
- 🙂: Block = Emoticons (range 0x1F600..0x1F64F)
Script: is a collection of characters used by one or more writing systems.
- Some scripts support several writing systems. For example, the Latin script supports the writing systems English, French, German, Latin, etc.
- Some languages can be written in multiple alternate writing systems that are supported by multiple scripts. For example, Turkish used the Arabic script before it transitioned to the Latin script in the early 20th century.
- Examples:
  - α: Script = Greek
  - Д: Script = Cyrillic

46.5.4 Unicode string property escapes ^ES2024

Only flag /v lets us use \p{} to specify sets of code point sequences via Unicode string properties (negation via \P{} is not supported). For example, RGI_Emoji is a Unicode string property:

> /^\p{RGI_Emoji}$/v.test('⛔') // 1 code point (1 code unit)
true
> /^\p{RGI_Emoji}$/v.test('🙂') // 1 code point (2 code units)
true
> /^\p{RGI_Emoji}$/v.test('😵‍💫') // 3 code points
true

Let’s see how the Unicode character property Emoji does with these inputs:

> /^\p{Emoji}$/v.test('⛔') // 1 code point (1 code unit)
true
> /^\p{Emoji}$/v.test('🙂') // 1 code point (2 code units)
true
> /^\p{Emoji}$/v.test('😵‍💫') // 3 code points
false

As expected, it only matches single code points.

46.5.4.1 Unicode string properties

For now, only the following Unicode string properties are supported by JavaScript:

Basic_Emoji: single code points
Emoji_Keycap_Sequence
RGI_Emoji_Modifier_Sequence
RGI_Emoji_Flag_Sequence
RGI_Emoji_Tag_Sequence
RGI_Emoji_ZWJ_Sequence
RGI_Emoji: union of all of the above sets

The semantics of Unicode string properties are defined in text files that enumerate code point sequences likes this (\x{23} is #):

0023 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: \x{23}
002A FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: *
0030 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 0
0031 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 1
0032 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 2
0033 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 3
0034 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 4
0035 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 5
0036 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 6
0037 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 7
0038 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 8
0039 FE0F 20E3 ; Emoji_Keycap_Sequence ; keycap: 9

46.6 Syntax: character classes

A character class wraps class ranges in square brackets. The class ranges specify a set of characters:

[«class ranges»] matches any character in the set.
[^«class ranges»] matches any character not in the set.

Rules for class ranges:

Non-syntax characters stand for themselves: [abc]
What characters must be escaped depends on the flags:
- Flag /v: Some of the following characters only have to be escaped if they appear twice, but it’s easier to always escape them. For details, see “Escaping inside character classes ([···])” (§46.2.3).
  - Syntax Characters:
```
^ $ \ . * + ? ( ) [ ] { } |
```
  - Slash:
```
/
```
  - Class Set Reserved Punctuators:
```
& - ! # % , : ; < = > @ ` ~
```
- No Unicode flag (neither /v nor /u):
  - We always must escape: \ ]
  - Some characters only have to be escaped in some locations:
    - ^ only has to be escaped if it comes first.
    - - only has to be escaped if it doesn’t come first or last.
Character escapes (\n, \x20, \u{1F44D}, etc.) have the usual meanings.
- Watch out: \b stands for backspace. Elsewhere in a regular expression, it matches word boundaries.
Character class escapes (\d, \P{White_Space}, \p{RGI_Emoji}, etc.) have the usual meanings.
A range of characters is specified via a dash: [a-z]

46.6.1 Code point sequences in character classes ^ES2024

With flag /v, we can use \q{} to add code point sequences to the set defined by a character class. This syntax is also called:

Class string disjunction (by the ECMAScript specification)
String literal (by the ECMAScript proposal for flag /v)

This is an example of using \q{}:

> /^[\q{😵‍💫}]$/v.test('😵‍💫')
true

Without \q{}, a grapheme cluster with multiple code points is still treated as multiple characters:

> /^[😵‍💫]$/v.test('😵‍💫')
false
> /^[\u{1F635}\u{200D}\u{1F4AB}]$/v.test('😵‍💫') // equivalent
false
> /^[😵‍💫]$/v.test('\u{1F635}')
true

We can use a single \q{} to add multiple code point sequences – if we separate them with pipes:

> /^[\q{abc|def}]$/v.test('abc')
true
> /^[\q{abc|def}]$/v.test('def')
true

46.6.2 Set operations for character classes ^ES2024

Flag /v enables set operations for character classes.

46.6.2.1 Nesting character classes

To enable set operations for character classes, we must be able to nest them. Character class escapes already provide some kind of nesting:

> /^[\d\w]$/v.test('7')
true
> /^[\d\w]$/v.test('H')
true
> /^[\d\w]$/v.test('?')
false

With flag /v, we can additionally nest character classes (the regular expression below is equivalent to the regular expression in the previous example):

> /^[[0-9][A-Za-z0-9_]]$/v.test('7')
true
> /^[[0-9][A-Za-z0-9_]]$/v.test('H')
true
> /^[[0-9][A-Za-z0-9_]]$/v.test('?')
false

46.6.2.2 Subtraction of character sets via `--`

We can use the -- operator to set-theoretically subtract the character sets defined by character classes or character class escapes:

> /^[\w--[a-g]]$/v.test('a')
false
> /^[\w--[a-g]]$/v.test('h')
true

> /^[\p{Number}--[0-9]]$/v.test('٣')
true
> /^[\p{Number}--[0-9]]$/v.test('3')
false

> /^[\p{RGI_Emoji}--\q{😵‍💫}]$/v.test('😵‍💫') // emoji has 3 code points
false
> /^[\p{RGI_Emoji}--\q{😵‍💫}]$/v.test('🙂')
true

Single code points can also be used on either side of the -- operator:

> /^[\w--a]$/v.test('a')
false
> /^[\w--a]$/v.test('b')
true

46.6.2.3 Intersection of character sets via `&&`

We can use the && operator to set-theoretically intersect the character sets defined by character classes or character class escapes:

> /[\p{ASCII}&&\p{Letter}]/v.test('D')
true
> /[\p{ASCII}&&\p{Letter}]/v.test('Δ')
false

> /^[\p{Script=Arabic}&&\p{Number}]$/v.test('٣')
true
> /^[\p{Script=Arabic}&&\p{Number}]$/v.test('ج')
false

46.6.2.4 Union of characters sets

Two compute the set-theoretical union of character sets, we only need to write their definining constructs next to each other inside a character class:

> /^[\p{Emoji_Keycap_Sequence}[a-z]]+$/v.test('a2️⃣c')
true

46.7 Syntax: capture groups

Numbered capture group: (a+)
- Backreference: \1, \2, etc.
Named capture group ^ES2018: (?<as>a+)
- Backreference: \k<as>
Noncapturing group: (?:a+)

46.8 Syntax: quantifiers

By default, all of the following quantifiers are greedy (they match as many characters as possible):

?: match never or once
*: match zero or more times
+: match one or more times
{n}: match n times
{n,}: match n or more times
{n,m}: match at least n times, at most m times.

To make them reluctant (so that they match as few characters as possible), put question marks (?) after them:

> /X.*X/.exec('XabcXdefX')[0]  // greedy
'XabcXdefX'
> /X.*?X/.exec('XabcXdefX')[0] // reluctant
'XabcX'

46.9 Syntax: assertions

^ matches only at the beginning of the input
$ matches only at the end of the input
\b matches only at a word boundary
- \B matches only when not at a word boundary

Overview of available lookaround assertions:

Pattern	Name
`(?=«pattern»)`	Positive lookahead	ES3
`(?!«pattern»)`	Negative lookahead	ES3
`(?<=«pattern»)`	Positive lookbehind	ES2018
`(?<!«pattern»)`	Negative lookbehind	ES2018

46.9.1 Lookahead assertions

Positive lookahead: (?=«pattern») matches if pattern matches what comes next.

Example: sequences of lowercase letters that are followed by an X.

> 'abcX def'.match(/[a-z]+(?=X)/g)
[ 'abc' ]

Note that the X itself is not part of the matched substring.

Negative lookahead: (?!«pattern») matches if pattern does not match what comes next.

Example: sequences of lowercase letters that are not followed by an X.

> 'abcX def'.match(/[a-z]+(?!X)/g)
[ 'ab', 'def' ]

46.9.2 Lookbehind assertions ^ES2018

Positive lookbehind: (?<=«pattern») matches if pattern matches what came before.

Example: sequences of lowercase letters that are preceded by an X.

> 'Xabc def'.match(/(?<=X)[a-z]+/g)
[ 'abc' ]

Negative lookbehind: (?<!«pattern») matches if pattern does not match what came before.

Example: sequences of lowercase letters that are not preceded by an X.

> 'Xabc def'.match(/(?<!X)[a-z]+/g)
[ 'bc', 'def' ]

Example: replace “.js” with “.html”, but not in “Node.js”.

> 'Node.js: index.js and main.js'.replace(/(?<!Node)\.js/g, '.html')
'Node.js: index.html and main.html'

46.10 Syntax: disjunction (`|`)

Caveat: this operator has low precedence (binds very weakly). Use groups if necessary:

^aa|zz$ matches all strings that start with aa and/or end with zz.
- Note that | has a lower precedence than ^ and $.
^(aa|zz)$ matches the two strings 'aa' and 'zz'.
^a(a|z)z$ matches the two strings 'aaz' and 'azz'.

46.11 Regular expression flags

Literal flag	Property name	ES	Description
`d`	`hasIndices`	ES2022	Switch on match indices
`g`	`global`	ES3	Match multiple times
`i`	`ignoreCase`	ES3	Match case-insensitively
`m`	`multiline`	ES3	`^` and `$` match per line
`s`	`dotAll`	ES2018	Dot matches line terminators
`u`	`unicode`	ES6	Unicode mode
`v`	`unicodeSets`	ES2024	Unicode sets mode (recommended)
`y`	`sticky`	ES6	No characters between matches

Table 46.1: These are the regular expression flags supported by JavaScript.

The following regular expression flags are available in JavaScript (table 46.1 provides a compact overview):

/d (.hasIndices): Some RegExp-related methods return match objects that describe where the regular expression matched in an input string. If this flag is on, each match object includes match indices which tell us where each group capture starts and ends. More information: “Match indices in match objects ^ES2022” (§46.14.1).
/g (.global) fundamentally changes how the following methods work.
- String.prototype.match()
- RegExp.prototype.exec()
- RegExp.prototype.test()
How, is explained in “The flags /g and /y, and the property .lastIndex (advanced)” (§46.16). In a nutshell: Without /g, the methods only consider the first match for a regular expression in an input string. With /g, they consider all matches.
/i (.ignoreCase) switches on case-insensitive matching:
```
> /a/.test('A')
false
> /a/i.test('A')
true
```
/m (.multiline): If this flag is on, ^ matches the beginning of each line and $ matches the end of each line. If it is off, ^ matches the beginning of the whole input string and $ matches the end of the whole input string.
```
> 'a1\na2\na3'.match(/^a./gm)
[ 'a1', 'a2', 'a3' ]
> 'a1\na2\na3'.match(/^a./g)
[ 'a1' ]
```
/s (.dotAll): By default, the dot does not match line terminators. With this flag, it does:
```
> /./.test('\n')
false
> /./s.test('\n')
true
```
Workaround: If /s isn’t supported, we can use [^] instead of a dot.
```
> /[^]/.test('\n')
true
```
Better support for Unicode (the “Unicode flags”):
- /u (.unicode): By default, the atomic units of matching are JavaScript characters (Unicode code units). This flag switches to Unicode code points as atomic units. It is explained in “Flag /u: characters are code points ^ES6” (§46.11.3).
- /v (.unicodeSets): This flag improves and replaces flag /u. It supports multi-code-point grapheme clusters and set operations in character classes. It is explained in “Flag /v: limited support for multi-code-point grapheme clusters ^ES2024” (§46.11.4). I recommend to use flag /v with all regular expressions – due to all the features it enables.
/y (.sticky): This flag mainly makes sense in conjunction with /g. When both are switched on, any match must directly follow the previous one (that is, it must start at index .lastIndex of the regular expression object). Therefore, the first match must be at index 0.
```
> 'a1a2 a3'.match(/a./gy)
[ 'a1', 'a2' ]
> '_a1a2 a3'.match(/a./gy) // first match must be at index 0
null

> 'a1a2 a3'.match(/a./g)
[ 'a1', 'a2', 'a3' ]
> '_a1a2 a3'.match(/a./g)
[ 'a1', 'a2', 'a3' ]
```
The main use case for /y is tokenization (during parsing). More information on this flag: “The flags /g and /y, and the property .lastIndex (advanced)” (§46.16).

46.11.1 How to order regular expression flags?

Consider the following regular expression: /“([^”]+)”/vdg

In which order should we list its flags? Two options are:

Alphabetical order: /dgv
In order of importance (arguably, /v is most fundamental etc.): /vgd

Given that (2) is not obvious, (1) is the better choice. JavaScript also uses it for the RegExp property .flags :

> /-/gymdivs.flags
'dgimsvy'

46.11.2 Without the Unicode flags `/u` and `/v`: characters are UTF-16 code units

Without the Unicode flags /u and /v, most constructs work with single UTF-16 code units, which is problematic whenever a character has more than one code unit – e.g. 🙂:

> '🙂'.length
2

Without Unicode flags, the largest character we can escape is a code unit, via a code unit escape. It consists of \u followed by four hexadecimal digits:

> /^\uD83D\uDE42$/.test('🙂')
true

The dot operator (.) matches code units, which is why we get two matches instead of one:

> '🙂'.match(/./g)
[ '\uD83D', '\uDE42' ]

Quantifiers apply to code units and therefore only repeat the second half of 🙂:

> /^🙂{2}$/.test('\uD83D\uDE42\uDE42')
true
> /^\uD83D\uDE42{2}$/.test('\uD83D\uDE42\uDE42') // equivalent
true

Character class escapes define sets of code units. Therefore, the class escape \D for “not a decimal digit” gets two matches:

> '🙂'.match(/\D/g)
[ '\uD83D', '\uDE42' ]

Character classes define sets of code units. Therefore, putting 🙂 in a character class has unintuitive consequences:

> /^[🙂]$/.test('🙂')
false
> /^[\uD83D\uDE42]$/.test('\uD83D\uDE42') // equivalent
false
> /^[🙂]$/.test('\uD83D')
true

46.11.3 Flag `/u`: characters are code points ^ES6

In the previous subsection, we encountered problems when we wanted to match a code point with more than one UTF-16 code unit – such as 🙂. Flag /u enables support for code points and fixes those problems.

We can escape code points via code point escapes – \u{} with one to six hexadecimal digits:

> /^\u{1F642}$/u.test('🙂')
true

The dot operator (.) matches code points:

> '🙂'.match(/./gu)
[ '🙂' ]

Quantifiers apply to code points:

> /^🙂{2}$/u.test('🙂🙂')
true

Character class escapes define sets of code points:

> '🙂'.match(/\D/gu)
[ '🙂' ]

A new kind of character class escapes is supported – Unicode character property escapes specify sets of code points:

> /^\p{Emoji}$/u.test('⛔') // 1 code point (1 code unit)
true
> /^\p{Emoji}$/u.test('🙂') // 1 code point (2 code units)
true

Character classes also define sets of code points:

> /^[🙂]$/u.test('🙂')
true
> /^[🙂]$/u.test('\uD83D')
false

46.11.4 Flag `/v`: limited support for multi-code-point grapheme clusters ^ES2024

Use flag /v whenever you can

This flag improves many aspects of JavaScript’s regular expressions and should be used by default. If a platform doesn’t support it yet, you should at least use /u.

Flag /v builds on the improvements brought by flag /u and fixes several of its shortcomings.
Note that flag /v and flag /u are mutually exclusive – we can’t use both at the same time:
```
assert.throws(
  () => eval('/-/uv'),
  SyntaxError
);
```

46.11.4.1 Limitation of flag `/u`: handling grapheme clusters with more than one code point

Some font glyphs are represented by grapheme clusters (code point sequences) with more than one code point – e.g. 😵‍💫:

> Array.from('😵‍💫').length // count code points
3

Flag /u does not help us with those kinds of grapheme clusters:

// Grapheme cluster is not matched by single dot
assert.equal(
  '😵‍💫'.match(/./gu).length, 3
);

// Quantifiers only repeat last code point of grapheme cluster
assert.equal(
  /^😵‍💫{2}$/u.test('😵‍💫😵‍💫'), false
);

// Character class escapes only match single code points
assert.equal(
  /^\p{Emoji}$/u.test('😵‍💫'), false
);

// Character classes only match single code points
assert.equal(
  /^[😵‍💫]$/u.test('😵‍💫'), false
);

46.11.4.2 Flag `/v`: Unicode string property escapes and character class string literals

Flag /v works like flag /u but provides better support for multi-code-point grapheme clusters. It doesn’t switch from code points to grapheme clusters everywhere, but it does fix the last two issues we encountered in the previous subsection – by adding support for multi-code-point grapheme clusters to:

Character class escapes: We can refer to Unicode string properties via \p{}.

> /^\p{RGI_Emoji}$/v.test('⛔') // 1 code point (1 code unit)
true
> /^\p{RGI_Emoji}$/v.test('🙂') // 1 code point (2 code units)
true
> /^\p{RGI_Emoji}$/v.test('😵‍💫') // 3 code points
true

Character classes: \q{} lets us define code point sequences.
```
> /^[\q{😵‍💫}]$/v.test('😵‍💫')
true
```

46.11.4.3 Flag `/v`: character class set operations

Character classes can be nested and combined via the set operations subtraction and intersection – see “Set operations for character classes ^ES2024” (§46.6.2).

46.11.4.4 Flag `/v`: improved case-insensitive matching

Flag /u has a quirk when it comes to case-insensitive matching: Using \P{···} produces different results than [^\p{···}]:

> /^\P{Lowercase_Letter}$/iu.test('A')
true
> /^\P{Lowercase_Letter}$/iu.test('a')
true

> /^[^\p{Lowercase_Letter}]$/iu.test('A')
false
> /^[^\p{Lowercase_Letter}]$/iu.test('a')
false

Observations:

Both ways of negating should produce the same results.
Intuitively, if we add /i to a regular expression, it should match at least as many strings as before – not fewer.

Flag /v fixes that quirk:

> /^\P{Lowercase_Letter}$/iv.test('A')
false
> /^\P{Lowercase_Letter}$/iv.test('a')
false

> /^[^\p{Lowercase_Letter}]$/iv.test('A')
false
> /^[^\p{Lowercase_Letter}]$/iv.test('a')
false

46.12 Pattern modifiers (inline flags) ^ES2025

Pattern modifiers let us apply a flag to a part of a regular expression (vs. all of the regular expression) – for example, in the following regular expression, the flag i is only applied to “HELLO”:

> /^x(?i:HELLO)x$/.test('xHELLOx')
true
> /^x(?i:HELLO)x$/.test('xhellox')
true
> /^x(?i:HELLO)x$/.test('XhelloX')
false

46.12.1 The syntax of pattern modifiers

This is what the syntax looks like:

(?ims-ims:pattern)
(?ims:pattern)
(?-ims:pattern)

Notes:

A flag that follows the question mark (?) is activated.
A flag that follows the hyphen (-) is deactivated.
A flag cannot appear in both the “activation section” and the “deactivation section”.
Without any flags, this syntax is simply a non-capturing group: (?:pattern)

Let’s change the previous example: Now all of the regular expression is case-insensitive – except for “HELLO”:

> /^x(?-i:HELLO)x$/i.test('xHELLOx')
true
> /^x(?-i:HELLO)x$/i.test('XHELLOX')
true
> /^x(?-i:HELLO)x$/i.test('XhelloX')
false

46.12.2 Which flags are supported?

The following flags can be used in pattern modifiers:

Literal flag	Property name	ES	Description
i	ignoreCase	ES3	Match case-insensitively
m	multiline	ES3	`^` and `$` match per line
s	dotAll	ES2018	Dot matches line terminators

For more information, see “Regular expression flags” (§46.11).

The remaining flags are not supported because they would either make regular expression semantics too complicated (e.g. flag v) or because they only make sense if applied to the whole regular expression (e.g. flag g).

46.12.3 Use case: changing flags for part of a regular expression

It’s sometimes useful to change flags for part of a regular expression. For example, Ron Buckton explains that changing flag m helps with matching a Markdown frontmatter block at the start of a file (I slightly edited his version):

const re = /(?-m:^)---\r?\n((?:^(?!---$).*\r?\n)*)^---$/m;
assert.equal(re.test('---a'), false);
assert.equal(re.test('---\n---'), true);
assert.equal(
  re.exec('---\n---')[1],
  ''
);
assert.equal(
  re.exec('---\na: b\n---')[1],
  'a: b\n'
);

How does this regular expression work?

By default, flag m is on and the anchor ^ matches at the beginning of a line and the anchor $ matches at the end of a line.
The very first ^ is different: It must match at the beginning of a string. That’s why we use a pattern modifier there and switch flag m off.

This is the regular expression, formatted with insignificant whitespace and explanatory comments:

(?-m:^)---\r?\n  # first line of string
(  # capturing group for the frontmatter
  (?:  # pattern for one line (non-capturing group)
    ^(?!---$)  # line must not start with "---" + EOL (lookahead)
    .*\r?\n
  )*
)
^---$  # closing delimiter of frontmatter

46.12.4 Use case: inlining flags

In some situations, flags being outside the actual regular expressions is inconvenient. Then pattern modifiers help. Examples include:

Storing regular expressions in configuration files, e.g. in JSON format.
The Regex+ library provides a template literal that makes creating regular expressions much more convenient. The syntax for specifying flags adds a bit of clutter that can be avoided via pattern modifiers (if they support the required flags):
```
regex('i')`world`
regex`(?i:world)`
```

46.12.5 Use case: regular expression fragments that change flags

In complex applications, it helps if you can compose large regular expressions out of smaller regular expressions. The aforementioned Regex+ library supports that. If a smaller regular expression needs different flags (e.g. because it wants to ignore case) then it can – thanks to pattern modifiers.

46.13 Properties of regular expression objects

Noteworthy:

Strictly speaking, only .lastIndex is a real instance property. All other properties are implemented via getters.
Accordingly, .lastIndex is the only mutable property. All other properties are read-only. If we want to change them, we need to copy the regular expression (see “Cloning and non-destructively modifying regular expressions” (§46.1.3) for details).

46.13.1 Flags as properties

Each regular expression flag exists as a property with a longer, more descriptive name:

> /a/i.ignoreCase
true
> /a/.ignoreCase
false

This is the complete list of flag properties:

.dotAll (/s)
.global (/g)
.hasIndices (/d)
.ignoreCase (/i)
.multiline (/m)
.sticky (/y)
.unicode (/u)
.unicodeSets (/v)

46.13.2 Other properties

Each regular expression also has the following properties:

.source ^ES3: The regular expression pattern
```
> /abc/ig.source
'abc'
```
.flags ^ES6: The flags of the regular expression
```
> /abc/ig.flags
'gi'
```
.lastIndex ^ES3: Used when flag /g is switched on. See “The flags /g and /y, and the property .lastIndex (advanced)” (§46.16) for details.

46.14 Match objects

Several regular expression-related methods return so-called match objects to provide detailed information for the locations where a regular expression matches an input string. These methods are:

RegExp.prototype.exec() returns null or a single match object.
String.prototype.match() returns null or a single match object (if flag /g is not set).
String.prototype.matchAll() returns an iterable of match objects (flag /g must be set; otherwise, an exception is thrown).

This is an example:

assert.deepEqual(
  /(a+)b/d.exec('ab aaab'),
  {
    0: 'ab',
    1: 'a',
    index: 0,
    input: 'ab aaab',
    groups: undefined,
    indices: {
      0: [0, 2],
      1: [0, 1],
      groups: undefined
    },
  }
);

The result of .exec() is a match object for the first match with the following properties:

[0]: the complete substring matched by the regular expression
[1]: capture of numbered group 1 (etc.)
.index: where did the match occur?
.input: the string that was matched against
.groups: captures of named groups (see “Named capture groups ^ES2018” (§46.15.2.1))
.indices: the index ranges of captured groups
- This property is only created if flag /d is switched on.

46.14.1 Match indices in match objects ^ES2022

Match indices are a feature of match objects: If we turn it on via the regular expression flag /d (property .hasIndices), they record the start and end indices of where groups were captured.

46.14.1.1 Match indices for numbered groups

This is how we access the captures of numbered groups:

const matchObj = /(a+)(b+)/d.exec('aaaabb');
assert.equal(
  matchObj[1], 'aaaa'
);
assert.equal(
  matchObj[2], 'bb'
);

Due to the regular expression flag /d, matchObj also has a property .indices that records for each numbered group where it was captured in the input string:

assert.deepEqual(
  matchObj.indices[1], [0, 4]
);
assert.deepEqual(
  matchObj.indices[2], [4, 6]
);

46.14.1.2 Match indices for named groups

The captures of named groups are accessed like this:

const matchObj = /(?<as>a+)(?<bs>b+)/d.exec('aaaabb');
assert.equal(
  matchObj.groups.as, 'aaaa'
);
assert.equal(
  matchObj.groups.bs, 'bb'
);

Their indices are stored in matchObj.indices.groups:

assert.deepEqual(
  matchObj.indices.groups.as, [0, 4]
);
assert.deepEqual(
  matchObj.indices.groups.bs, [4, 6]
);

46.14.1.3 A more realistic example

One important use case for match indices are parsers that point to where exactly a syntactic error is located. The following code solves a related problem: It points to where quoted content starts and where it ends (see demonstration at the end).

const reQuoted = /“([^”]+)”/dgv;
function pointToQuotedText(str) {
  const startIndices = new Set();
  const endIndices = new Set();
  for (const match of str.matchAll(reQuoted)) {
    const [start, end] = match.indices[1];
    startIndices.add(start);
    endIndices.add(end);
  }
  let result = '';
  for (let index=0; index < str.length; index++) {
    if (startIndices.has(index)) {
      result += '[';
    } else if (endIndices.has(index+1)) {
      result += ']';
    } else {
      result += ' ';
    }
  }
  return result;
}

assert.equal(
  pointToQuotedText(
    'They said “hello” and “goodbye”.'),
    '           [   ]       [     ]  '
);

46.15 Methods for working with regular expressions

46.15.1 By default, regular expressions match anywhere in a string

By default, regular expressions match anywhere in a string:

> /a/.test('__a__')
true

We can change that by using assertions such as ^ (or by using the flag /y):

> /^a/.test('__a__')
false
> /^a/.test('a__')
true

46.15.2 `string.match(regExp)` without `/g`: getting a match object for the first match

Without the flag /g, string.match(regExp) returns a match object for the first match of regExp in str:

assert.deepEqual(
  'ab aab'.match(/(a+)b/),
  {
    0: 'ab',
    1: 'a',
    index: 0,
    input: 'ab aab',
    groups: undefined,
  }
);

46.15.2.1 Named capture groups ^ES2018

The previous example contained a single numbered group. The following example demonstrates named groups:

assert.deepEqual(
  'ab aab'.match(/(?<as>a+)b/),
  {
    0: 'ab',
    1: 'a',
    index: 0,
    input: 'ab aab',
    groups: { as: 'a' },
  }
);

In the result of .match(), we can see that a named group is also a numbered group – its capture exists twice:

Once as a numbered capture (property '1').
Once as a named capture (property groups.as).

46.15.2.2 Duplicate named capture groups ^ES2025

Since ECMAScript 2025, we can use the same group name twice – as long as it appears in different alternatives:

const RE = /(?<a>a(?<xs>x+))|(?<b>b(?<xs>x+))/v;
assert.deepEqual(
  'axx'.match(RE).groups,
  {
    a: 'axx',
    xs: 'xx',
    b: undefined,
    __proto__: null,
  }
);
assert.deepEqual(
  'bxx'.match(RE).groups,
  {
    a: undefined,
    xs: 'xx',
    b: 'bxx',
    __proto__: null,
  }
);

The following is not allowed (group xs would match twice):

assert.throws(
  () => eval('/(?<a>a(?<xs>x+))(?<b>b(?<xs>x+))/v'),
  /^SyntaxError:.* Duplicate capture group name$/
);

46.15.3 `string.match(regExp)` with `/g`: getting all group 0 captures ^ES3

With flag /g, string.match(regExp) returns all substrings of str that match regExp:

> 'ab aab'.match(/(a+)b/g)
[ 'ab', 'aab' ]

If there is no match, .match() returns null:

> 'xyz'.match(/(a+)b/g)
null

We can use the nullish coalescing operator (??) to protect ourselves against null:

const numberOfMatches = (str.match(regExp) ?? []).length;

We can also use optional chaining (?.) and combine it with the nullish coalescing operator:

const numberOfMatches = str.match(regExp)?.length ?? 0;

46.15.4 `string.matchAll(regExp)`: getting an iterable over all match objects ^ES2020

This is how .matchAll() is invoked:

const matchIterable = str.matchAll(regExp);

Given a string and a regular expression, .matchAll() returns an iterable over the match objects of all matches.

In the following example, we use Array.from() to convert iterables to Arrays so that we can compare them better.

> Array.from('-a-a-a'.matchAll(/-(a)/gv))
[
  { 0:'-a', 1:'a', index: 0, input: '-a-a-a', groups: undefined },
  { 0:'-a', 1:'a', index: 2, input: '-a-a-a', groups: undefined },
  { 0:'-a', 1:'a', index: 4, input: '-a-a-a', groups: undefined },
]

Flag /g must be set:

> Array.from('-a-a-a'.matchAll(/-(a)/v))
TypeError: String.prototype.matchAll called with a non-global
RegExp argument

.matchAll() isn’t affected by regExp.lastIndex and doesn’t change it.

Exercise: Extracting quoted text via .matchAll()

exercises/regexps/extract_quoted_test.mjs

46.15.4.1 Implementing `.matchAll()`

.matchAll() could be implemented via .exec() as follows:

function* matchAll(str, regExp) {
  if (!regExp.global) {
    throw new TypeError('Flag /g must be set!');
  }
  // Preserve and reset flags
  const localCopy = new RegExp(regExp, regExp.flags);
  let match;
  while (match = localCopy.exec(str)) {
    yield match;
  }
}

Making a local copy ensures two things:

regex.lastIndex isn’t changed.
localCopy.lastIndex is zero.

Using matchAll():

const str = '"fee" "fi" "fo" "fum"';
const regex = /"([^"]*)"/g;

for (const match of matchAll(str, regex)) {
  console.log(match[1]);
}

Output:

fee
fi
fo
fum

46.15.5 `regExp.exec(str)`: capturing groups ^ES3

46.15.5.1 `regExp.exec(str)` without `/g`: getting a match object for the first match

Without flag /g, regExp.exec() works like string.match() – it returns a single match object.

46.15.5.2 `regExp.exec(str)` with `/g`: looping over all matches

Better alternative for retrieving all matches: string.matchAll(regExp) ^ES2020

Since ECMAScript 2020, JavaScript has another method for retrieving all matches: string.matchAll(regExp). That method is easier and safer to use: It returns an iterable, is unaffected by .lastIndex and throws an exception if flag /g is missing.

If we want to retrieve all matches of a regular expression (not just the first one), we need to switch on the flag /g. Then we can call .exec() multiple times and get one match each time. After the last match, .exec() returns null.

> const regExp = /(a+)b/g;
> regExp.exec('ab aab')
{ 0: 'ab', 1: 'a', index: 0, input: 'ab aab', groups: undefined }
> regExp.exec('ab aab')
{ 0: 'aab', 1: 'aa', index: 3, input: 'ab aab', groups: undefined }
> regExp.exec('ab aab')
null

Therefore, we can loop over all matches as follows:

const regExp = /(a+)b/g;
const str = 'ab aab';

let match;
// Check for null via truthiness
// Alternative: while ((match = regExp.exec(str)) !== null)
while (match = regExp.exec(str)) {
  console.log(match[1]);
}

Output:

a
aa

46.15.6 `string.match()` vs. `string.matchAll()` vs. `regExp.exec()`

The following table summarizes the differences between three methods:

	Without `/g`	With `/g`
`string.match(regExp)`	First match object	Array of group 0 captures
`string.matchAll(regExp)`	`TypeError`	Iterable over match objects
`regExp.exec(string)`	First match object	Next match object or `null`

46.15.7 Replacing with `string.replace()` and `string.replaceAll()`

Both replacing methods have two parameters:

string.replace(searchValue, replacementValue)
string.replaceAll(searchValue, replacementValue)

searchValue can be:

A string
A regular expression

replacementValue can be:

String: Replace matches with this string. The character $ has special meaning and lets us insert captures of groups and more (details are explained later).
Function: Compute strings that replace matches via this function.

The two methods differ as follows:

.replace() replaces the first occurrence of a string or a regular expression without /g.
.replaceAll() replaces all occurrences of a string or a regular expression with /g.

This table summarizes how that works:

Search for: →	string	RegExp w/o `/g`	RegExp with `/g`
`.replace`	First occurrence	First occurrence	(All occurrences)
`.replaceAll`	All occurrences	`TypeError`	All occurrences

The last column of .replace() is in parentheses because this method existed long before .replaceAll() and therefore supports functionality that should now be handled via the latter method. If we could change that, .replace() would throw a TypeError here.

We first explore how .replace() and .replaceAll() work individually when replacementValue is a simple string (without the character $). Then we examine how both are affected by more complicated replacement values.

46.15.7.1 `string.replace(searchValue, replacementValue)` ^ES3

How .replace() operates is influenced by its first parameter searchValue:

Regular expression without /g: Replace first match of this regular expression.
```
> 'aaa'.replace(/a/, 'x')
'xaa'
```
String: Replace first occurrence of this string (the string is interpreted verbatim, not as a regular expression).
```
> 'aaa'.replace('a', 'x')
'xaa'
```
Regular expression with /g: Replace all matches of this regular expression.
```
> 'aaa'.replace(/a/g, 'x')
'xxx'
```
Recommendation: If .replaceAll() is available, it’s better to use that method in this case – its purpose is to replace multiple occurrences.

If we want to replace every occurrence of a string, we have two options:

We can use .replaceAll() (which was introduced in ES2021).
Later in this chapter, we will encounter [the tool function escapeForRegExp()) which will help us convert a string into a regular expression that matches that string multiple times (e.g., '*' becomes /\*/g).

46.15.7.2 `string.replaceAll(searchValue, replacementValue)` ^ES2021

How .replaceAll() operates is influenced by its first parameter searchValue:

Regular expression with /g: Replace all matches of this regular expression.
```
> 'aaa'.replaceAll(/a/g, 'x')
'xxx'
```
String: Replace all occurrences of this string (the string is interpreted verbatim, not as a regular expression).
```
> 'aaa'.replaceAll('a', 'x')
'xxx'
```
Regular expression without /g: A TypeError is thrown (because the purpose of .replaceAll() is to replace multiple occurrences).
```
> 'aaa'.replaceAll(/a/, 'x')
TypeError: String.prototype.replaceAll called with
a non-global RegExp argument
```

46.15.7.3 The parameter `replacementValue` of `.replace()` and `.replaceAll()`

So far, we have only used the parameter replacementValue with simple strings, but it can do more. If its value is:

A string, then matches are replaced with this string. The character $ has special meaning and lets us insert captures of groups and more (read on for details).
A function, then matches are replaced by strings that are computed via this function.

46.15.7.4 `replacementValue` is a string

If the replacement value is a string, the dollar sign has special meaning – it inserts text matched by the regular expression:

Text	Result
`$$`	single `$`
`$&`	complete match
$`	text before match
`$'`	text after match
`$n`	capture of numbered group `n` (`n` > 0)
`$<name>`	capture of named group `name` (ES2018)

Example: Inserting the text before, inside, and after the matched substring.

> 'a1 a2'.replaceAll(/a/g, "($`|$&|$')")
'(|a|1 a2)1 (a1 |a|2)2'

Example: Inserting the captures of numbered groups.

> const regExp = /^([A-Za-z]+): (.*)$/gv;
> 'first: Jane'.replaceAll(regExp, 'KEY: $1, VALUE: $2')
'KEY: first, VALUE: Jane'

Example: Inserting the captures of named groups.

> const regExp = /^(?<key>[A-Za-z]+): (?<value>.*)$/gv;
> 'first: Jane'.replaceAll(regExp, 'KEY: $<key>, VALUE: $<value>')
'KEY: first, VALUE: Jane'

Exercise: Change quotes via .replace() and a named group

exercises/regexps/change_quotes_test.mjs

46.15.7.5 `replacementValue` is a function

If the replacement value is a function, we can compute each replacement. In the following example, we multiply each non-negative integer that we find by two.

assert.equal(
  '3 cats and 4 dogs'.replaceAll(/[0-9]+/g, (all) => 2 * Number(all)),
  '6 cats and 8 dogs'
);

The replacement function gets the following parameters. Note how similar they are to match objects. These parameters are all positional, but I’ve included how one might name them:

all: complete match
g1: capture of numbered group 1
Etc.
index: where did the match occur?
input: the string in which we are replacing
groups ^ES2018: captures of named groups (an object). Always the last parameter.

If we are only interested in groups, we can use the following technique:

const result = 'first=jane, last=doe'.replace(
  /(?<key>[a-z]+)=(?<value>[a-z]+)/g,
  (...args) => { // (A)
    const groups = args.at(-1); // (B)
    const {key, value} = groups;
    return key.toUpperCase() + '=' + value.toUpperCase();
  });
assert.equal(result, 'FIRST=JANE, LAST=DOE');

Due to the rest parameter in line A, args contains an Array with all parameters. We access the last parameter via the Array method .at() in line B.

46.15.8 `regExp.test(str)`: is there a match? ^ES3

The regular expression method regExp.test(str) returns true if regExp matches str:

> /bc/.test('ABCD')
false
> /bc/i.test('ABCD')
true
> /\.mjs$/.test('main.mjs')
true

With .test() we should normally avoid the /g flag. If we use it, we generally don’t get the same result every time we call the method:

> const r = /a/g;
> r.test('aab')
true
> r.test('aab')
true
> r.test('aab')
false

The results are due to /a/ having two matches in the string. After all of those were found, .test() returns false. For more information, see “The flags /g and /y, and the property .lastIndex (advanced)” (§46.16).

46.15.9 `string.search(regExp)`: at what index is the match? ^ES3

The string method .search() returns the first index of str at which there is a match for regExp:

> '_abc_'.search(/abc/)
1
> 'main.mjs'.search(/\.mjs$/)
4

46.15.10 `string.split(separator, limit?)`: splitting strings ^ES3

Splits the string into an Array of substrings – the strings that occur between the separators.

The separator can be a string:

> 'a : b : c'.split(':')
[ 'a ', ' b ', ' c' ]

It can also be a regular expression:

> 'a x:yyy b'.split(/x+:y+/)
[ 'a ', ' b' ]
> 'a x:yyy b'.split(/(x+):(y+)/)
[ 'a ', 'x', 'yyy', ' b' ]

The last invocation demonstrates that captures made by groups in the regular expression become elements of the returned Array.

If we want the separators to be part of the returned string fragments, we can use a regular expression with a lookbehind assertion or a lookahead assertion:

> 'a: b: c'.split(/(?<=:) */)
[ 'a:', 'b:', 'c' ]
> 'a :b :c'.split(/ *(?=:)/)
[ 'a', ':b', ':c' ]

Thanks to the lookaround assertions, the regular expression used for splitting leaves the colons in the fragments and only removes the spaces following or preceding the colons.

Pitfall: .split('') splits into JavaScript characters, but we usually want to split into grapheme clusters or at least Unicode code points. Therefore, it’s better to use Intl.Segmenter or Array.from() for splitting. For more information, see “Atoms of text: code points, JavaScript characters, grapheme clusters” (§22.7).

46.16 The flags `/g` and `/y`, and the property `.lastIndex` (advanced)

In this section, we examine how the RegExp flags /g and /y work and how they depend on the RegExp property .lastIndex. We’ll also discover an interesting use case for .lastIndex that you may find surprising.

46.16.1 The flags `/g` and `/y`

Every method reacts differently to /g and /y; this gives us a rough general idea:

/g (.global, ES3): The regular expression should match multiple times, anywhere in a string.
/y (.sticky, ES6): Any match inside a string should immediately follow a previous match (the matches “stick” together).

If a regular expression has neither the flag /g nor the flag /y, matching happens once and starts at the beginning.

With either /g or /y, matching is performed relative to a “current position” inside the input string. That position is stored in the regular expression property .lastIndex.

There are three groups of regular-expression-related methods:

The string methods .search(regExp) and .split(regExp) completely ignore /g and /y (and therefore also .lastIndex).
The RegExp methods .exec(str) and .test(str) change in two ways if either /g or /y is set.

First, we get multiple matches, by calling one method repeatedly. Each time, it returns either another result (a match object or true) or an “end of results” value (null or false).

Second, the regular expression property .lastIndex is used to step through the input string. On one hand, .lastIndex determines where matching starts:
- /g means that a match must begin at .lastIndex or later.
- /y means that a match must begin at .lastIndex. That is, the beginning of the regular expression is anchored to .lastIndex.
  
  Note that ^ and $ continue to work as usually: They anchor matches to the beginning or end of the input string, unless .multiline is set. Then they anchor to the beginnings or ends of lines.
On the other hand, .lastIndex is set to one plus the last index of the previous match.
The remaining methods are:
- The string methods .match(regExp) and .matchAll(regExp) (throws without /g)
- The string methods .replace(regExp, str) and .replaceAll(regExp, str) (throws without /g)
These are all affected as follows:
- /g leads to multiple matches.
- /y leads to a single match that must start at .lastIndex.
- /gy leads to multiple matches without gaps.

This was a first overview. The next sections get into more details.

46.16.2 How exactly are methods affected by `/g` and `/y`?

The following methods are unaffected by /g and /y:

string.search(regExp)
string.split(regExp)

46.16.2.1 `string.match(regExp)` ^ES3

Without /g and /y, .match() ignores .lastIndex and always returns a match object for the first match.

> const re = /#/; re.lastIndex = 1;
> ['##-#'.match(re), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]
> ['##-#'.match(re), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]

With /y, the match must start at exactly .lastIndex. .lastIndex is updated. If there is no match, null is returned.

> const re = /#/y; re.lastIndex = 1;
> ['##-#'.match(re), re.lastIndex]
[{ 0: '#', index: 1, input: '##-#' }, 2]
> ['##-#'.match(re), re.lastIndex]
[null, 0]

With /g, we get all matches (group 0) in an Array. .lastIndex is ignored and reset to zero.

> const re = /#/g; re.lastIndex = 1;
> '##-#'.match(re)
['#', '#', '#']
> re.lastIndex
0

/gy works similarly to /g, but no gaps between matches are allowed:

> const re = /#/gy; re.lastIndex = 1;
> '##-#'.match(re)
['#', '#']
> re.lastIndex
0

46.16.2.2 `string.matchAll(regExp)` ^ES2020

If /g is not set, .matchAll() throws an exception:

> const re = /#/y; re.lastIndex = 1;
> '##-#'.matchAll(re)
TypeError: String.prototype.matchAll called with
a non-global RegExp argument

If /g is set, matching starts at .lastIndex and that property isn’t changed:

> const re = /#/g; re.lastIndex = 1;
> Array.from('##-#'.matchAll(re))
[
  { 0: '#', index: 1, input: '##-#' },
  { 0: '#', index: 3, input: '##-#' },
]
> re.lastIndex
1

/gy works similarly to /g, but no gaps between matches are allowed:

> const re = /#/gy; re.lastIndex = 1;
> Array.from('##-#'.matchAll(re))
[
  { 0: '#', index: 1, input: '##-#' },
]
> re.lastIndex
1

46.16.2.3 `regExp.exec(str)` ^ES3

Without /g and /y, .exec() ignores .lastIndex and always returns a match object for the first match:

> const re = /#/; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 0, input: '##-#' }, 1]

With /g, the match must start at .lastIndex or later. .lastIndex is updated. If there is no match, null is returned.

> const re = /#/g; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 1, input: '##-#' }, 2]
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 3, input: '##-#' }, 4]
> [re.exec('##-#'), re.lastIndex]
[null, 0]

With /y, the match must start at exactly .lastIndex. .lastIndex is updated. If there is no match, null is returned.

> const re = /#/y; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{ 0: '#', index: 1, input: '##-#' }, 2]
> [re.exec('##-#'), re.lastIndex]
[null, 0]

With /gy, .exec() behaves the same as with /y.

46.16.2.4 `string.replace(regExp, str)` ^ES3

Without /g and /y, only the first occurrence is replaced:

> const re = /#/; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'x#-#'
> re.lastIndex
1

With /g, all occurrences are replaced. .lastIndex is ignored but reset to zero.

> const re = /#/g; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'xx-x'
> re.lastIndex
0

With /y, only the (first) occurrence at .lastIndex is replaced. .lastIndex is updated.

> const re = /#/y; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'#x-#'
> re.lastIndex
2

/gy works like /g, but gaps between matches are not allowed:

> const re = /#/gy; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'xx-#'
> re.lastIndex
0

46.16.2.5 `string.replaceAll(regExp, str)` ^ES2021

.replaceAll() works like .replace() but throws an exception if /g is not set:

> const re = /#/y; re.lastIndex = 1;
> '##-#'.replaceAll(re, 'x')
TypeError: String.prototype.replaceAll called
with a non-global RegExp argument

46.16.2.6 `regExp.test(str)` ^ES3

This method behaves the same same as .exec(), but instead of returning a match object, it returns true, and instead of returning null, it returns false.

For example, without either /g or /y, the result is always true:

> const re = /#/; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex]
[true, 1]
> [re.test('##-#'), re.lastIndex]
[true, 1]

With /g, there are two matches:

> const re = /#/g; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex]
[true, 2]
> [re.test('##-#'), re.lastIndex]
[true, 4]
> [re.test('##-#'), re.lastIndex]
[false, 0]

With /y, there is only one match:

> const re = /#/y; re.lastIndex = 1;
> [re.test('##-#'), re.lastIndex]
[true, 2]
> [re.test('##-#'), re.lastIndex]
[false, 0]

With /gy, .test() behaves the same as with /y.

46.16.3 Four pitfalls of `/g` and `/y` and how to deal with them

We will first look at four pitfalls of /g and /y and then at ways of dealing with those pitfalls.

46.16.3.1 Pitfall 1: We can’t inline a regular expression with `/g` or `/y`

A regular expression with /g can’t be inlined. For example, in the following while loop, the regular expression is created fresh, every time the condition is checked. Therefore, its .lastIndex is always zero and the loop never terminates.

let matchObj;
// Infinite loop
while (matchObj = /a+/g.exec('bbbaabaaa')) {
  console.log(matchObj[0]);
}

With /y, the problem is the same.

46.16.3.2 Pitfall 2: Removing `/g` or `/y` can break code

If code expects a regular expression with /g and has a loop over the results of .exec() or .test(), then a regular expression without /g can cause an infinite loop:

function collectMatches(regExp, str) {
  const matches = [];
  let matchObj;
  // Infinite loop
  while (matchObj = regExp.exec(str)) {
    matches.push(matchObj[0]);
  }
  return matches;
}
collectMatches(/a+/, 'bbbaabaaa'); // Missing: flag /g

Why is there an infinite loop? Because .exec() always returns the first result, a match object, and never null.

With /y, the problem is the same.

46.16.3.3 Pitfall 3: Adding `/g` or `/y` can break code

With .test(), there is another caveat: It is affected by .lastIndex. Therefore, if we want to check exactly once if a regular expression matches a string, then the regular expression must not have /g. Otherwise, we generally get a different result every time we call .test():

> const regExp = /^X/g;
> [regExp.test('Xa'), regExp.lastIndex]
[ true, 1 ]
> [regExp.test('Xa'), regExp.lastIndex]
[ false, 0 ]
> [regExp.test('Xa'), regExp.lastIndex]
[ true, 1 ]

The first invocation produces a match and updates .lastIndex. The second invocation does not find a match and resets .lastIndex to zero.

If we create a regular expression specifically for .test(), then we probably won’t add /g. However, the likeliness of encountering /g increases if we use the same regular expression for replacing and for testing.

Once again, this problem also exists with /y:

> const regExp = /^X/y;
> regExp.test('Xa')
true
> regExp.test('Xa')
false
> regExp.test('Xa')
true

46.16.3.4 Pitfall 4: Code can produce unexpected results if `.lastIndex` isn’t zero

If an operation uses regular expression methods that are affected by .lastIndex, then we must ensure that .lastIndex is zero at the beginning. Otherwise, we may get unexpected results:

function countMatches(regExp, str) {
  let count = 0;
  while (regExp.test(str)) {
    count++;
  }
  return count;
}

const myRegExp = /a/g;
myRegExp.lastIndex = 4;
assert.equal(
  countMatches(myRegExp, 'babaa'), 1 // should be 3
);

Normally, .lastIndex is zero in newly created regular expressions and we won’t change it explicitly like we did in the example. But .lastIndex can still end up not being zero if we use the regular expression multiple times.

46.16.3.5 How to avoid the pitfalls of `/g` and `/y`

As an example of dealing with /g and .lastIndex, we revisit countMatches() from the previous example. How do we prevent a wrong regular expression from breaking our code? Let’s look at three approaches.

46.16.3.5.1 Throwing exceptions

First, we can throw an exception if /g isn’t set or .lastIndex isn’t zero:

function countMatches(regExp, str) {
  if (!regExp.global) {
    throw new Error('Flag /g of regExp must be set');
  }
  if (regExp.lastIndex !== 0) {
    throw new Error('regExp.lastIndex must be zero');
  }
  
  let count = 0;
  while (regExp.test(str)) {
    count++;
  }
  return count;
}

46.16.3.5.2 Cloning regular expressions

Second, we can clone the parameter. That has the added benefit that regExp won’t be changed.

function countMatches(regExp, str) {
  const cloneFlags = regExp.flags + (regExp.global ? '' : 'g');
  const clone = new RegExp(regExp, cloneFlags);

  let count = 0;
  while (clone.test(str)) {
    count++;
  }
  return count;
}

46.16.3.5.3 Using an operation that isn’t affected by `.lastIndex` or flags

Several regular expression operations are not affected by .lastIndex or by flags. For example, .match() ignores .lastIndex if /g is present:

function countMatches(regExp, str) {
  if (!regExp.global) {
    throw new Error('Flag /g of regExp must be set');
  }
  return (str.match(regExp) ?? []).length;
}

const myRegExp = /a/g;
myRegExp.lastIndex = 4;
assert.equal(countMatches(myRegExp, 'babaa'), 3); // OK!

Here, countMatches() works even though we didn’t check or fix .lastIndex.

46.16.4 Use case for `.lastIndex`: starting matching at a given index

Apart from storing the current position when matching multiple times, .lastIndex can also be used to:

Match once at exactly a given position (via flag /y)
Match once at a given position or later (via flag /g)

We’ll look at examples next.

46.16.4.1 Example: Checking if a regular expression matches at a given index

Given that .test() is affected by /y and .lastIndex, we can use it to check if a regular expression regExp matches a string str at exactly a given index:

function startsWith(regExp, str, index) {
  if (!regExp.sticky || regExp.global) {
    throw new Error('Flag /y must be set. Flag /g must not be set.');
  }
  regExp.lastIndex = index;
  return regExp.test(str);
}
assert.equal(
  startsWith(/x+/y, 'aaxxx', 0), false
);
assert.equal(
  startsWith(/x+/y, 'aaxxx', 2), true
);

regExp is anchored to .lastIndex due to /y.

Note that we must not use the assertion ^ which would anchor regExp to the beginning of the input string.

46.16.4.2 Example: Finding the location of a match, starting at a given index

.search() lets us find the location where a regular expression matches:

> '#--#'.search(/#/)
0

Alas, we can’t change where .search() starts looking for matches. As a workaround, we can use .exec() for searching:

function searchFrom(regExp, str, index) {
  if (!regExp.global || regExp.sticky) {
    throw new Error('Flag /g must be set. Flag /y must not be set.');
  }
  regExp.lastIndex = index;
  const match = regExp.exec(str);
  if (match) {
    return match.index;
  } else {
    return -1;
  }
}

assert.equal(
  searchFrom(/#/g, '#--#', 0), 0
);
assert.equal(
  searchFrom(/#/g, '#--#', 1), 3
);

46.16.4.3 Example: Replacing an occurrence at a given index

When used without /g and with /y, .replace() makes one replacement – if there is a match at .lastIndex:

function replaceOnceAt(str, regExp, replacement, index) {
  if (!regExp.sticky || regExp.global) {
    throw new Error('Flag /y must be set. Flag /g must not be set.');
  }
  regExp.lastIndex = index;
  return str.replace(regExp, replacement);
}
assert.equal(
  replaceOnceAt('aa aaaa a', /a+/y, 'X', 0), 'X aaaa a')
;
assert.equal(
  replaceOnceAt('aa aaaa a', /a+/y, 'X', 3), 'aa X a'
);
assert.equal(
  replaceOnceAt('aa aaaa a', /a+/y, 'X', 8), 'aa aaaa X'
);

46.16.5 The downsides and an upside of `.lastIndex`

The regular expression property .lastIndex has two significant downsides:

It makes regular expressions stateful:
- We now have to be mindful of the states of regular expressions and how we share them.
- For many use cases, we can’t make them immutable via freezing, either.
Support for .lastIndex is inconsistent among regular expression operations.

On the upside, .lastIndex also gives us additional useful functionality: We can dictate where matching should begin (for some operations). Ideally, such functionality would be provided via an assertion such as \G (vs. a flag /y) and an index as an argument for various regular-expression-related methods (vs. a regular expression property .lastIndex). But it’s better than not having it at all.

46.17 `RegExp.escape()`: escaping text so that it can be used inside a regular expression ^ES2025

For a string text, RegExp.escape(text) creates a regular expression pattern that matches text.

Characters that have special meaning in regular expressions can’t be used verbatim and have to be escaped:

> RegExp.escape('(*)')
'\\(\\*\\)'

Note that we see each regular expression backslash twice: One of them is the actual backslash, the other one escapes it inside the string literal:

> '\\(\\*\\)' === String.raw`\(\*\)`
true

Characters that have no special meaning don’t have to be escaped:

> RegExp.escape('_abc123')
'_abc123'

46.17.1 Use case for `RegExp.escape()`: replacing all occurrences of a text

The classic use case for escaping was searching and replacing text:

function replacePlainText(str, searchText, replace) {
  const searchRegExp = new RegExp(
    RegExp.escape(searchText),
    'gu'
  );
  return str.replace(searchRegExp, replace)
}
assert.equal(
  replacePlainText('(a) and (a)', '(a)', '@'),
  '@ and @'
);

However, since ES2021, we have .replaceAll():

assert.equal(
  '(a) and (a)'.replaceAll('(a)', '@'),
  '@ and @'
);

46.17.2 Use case for `RegExp.escape()`: part of a regular expression must match a given text

The following code removes all occurrences of text inside str that are not quoted:

function removeUnquotedText(str, text) {
  const regExp = new RegExp(
    `(?<!“)${RegExp.escape(text)}(?!”)`,
    'gu'
  );
  return str.replaceAll(regExp, '•');
}
assert.equal(
  removeUnquotedText('“yes” and yes and “yes”', 'yes'),
  '“yes” and • and “yes”'
);

The same approach can also be used to find or count unquoted text.

Exercise: Searching for text that contains a wildcard

✅ exercises/regexps/includes-with-elision_test.mjs

46.18 Matching everything or nothing

Sometimes, we may need a regular expression that matches everything or nothing – for example, as a default value.

Match everything: /(?:)/

The empty group () matches everything. We make it non-capturing (via ?:), to avoid unnecessary work.
```
> /(?:)/.test('')
true
> /(?:)/.test('abc')
true
```
Match nothing: /.^/

^ only matches at the beginning of a string. The dot moves matching beyond the first character and now ^ doesn’t match anymore.
```
> /.^/.test('')
false
> /.^/.test('abc')
false
```

Regular expression literals can’t be empty because // starts a single-line comment. Therefore, the first of the previous two regular expressions is used in this case:

> new RegExp('')
/(?:)/

46.19 Tips for making regular expressions easier to use

In this section, we explore ways in which we can make regular expressions easier to use. We’ll use the following regular expression as an example:

const RE_API_SIGNATURE =
  /^(new |get )?([A-Za-z0-9_.\[\]]+)/;

Right now, it is still fairly cryptic. It will be much easier to understand once we get to “insignificant whitespace”.

46.19.1 Tip: Use flag `/v`

If we add flag /v to our regular expression, we get fewer quirks and more features:

const RE_API_SIGNATURE =
  /^(new |get )?([A-Za-z0-9_.\[\]]+)/v;

/v doesn’t change anything in this particular case, but it helps us if we add grapheme clusters with more than one code point or if we want features such as set operations in character classes.

46.19.2 Tip: order flags alphabetically

If there is more than one flag, we should order the flags alphabetically – e.g.:

/pattern/giv

That makes ordering consistent and is also how JavaScript displays regular expressions:

> String(/pattern/vgi)
'/pattern/giv'

46.19.3 Tip: Use named capture groups

Our regular expression contains two positional capture groups. If we name them, they describe their purposes and we need less external documentation:

const RE_API_SIGNATURE =
  /^(?<prefix>new |get )?(?<name>[A-Za-z0-9_.\[\]]+)/;

46.19.4 Tip: insignificant whitespace and line comments via `#`

So far, the regular expression is still fairly hard to read. We can change that by adding spaces and line breaks. Since regular expression literals don’t allow us to do that, we use the library Regex+ which provides us with the template tag regex:

import {regex} from 'regex';

const RE_API_SIGNATURE = regex`
  ^
  (?<prefix>
    new \x20  # constructor
    |
    get \x20  # getter
  )?
  (?<name>
    # Square brackets are needed for symbol keys
    [
      A-Z a-z 0-9 _
      .
      \[ \]
    ]+
  )
`;

The feature of ignoring whitespace in regular expression patterns is called insignificant whitespace. Additionally, we used a feature called inline comments – which are started by hash symbols (#).

Two observations:

Since all spaces are removed, we use the hex escape \x20 to express that there is a space after new and after get.
Alas, line comments are not allowed inside character classes. That’s why the comment about square brackets comes before the character class.

In the future, JavaScript may get built-in support for insignificant whitespace via a flag /x (ECMAScript proposal).

With the regex template tag, the following flags are always active:

Flag /v
Flag /x (emulated) enables insignificant whitespace and line comments via #.
Flag /n (emulated) enables named capture only mode, which prevents numbered groups from capturing. In other words: (pattern) is treated like (?:pattern).

46.19.5 Tip: Write tests for your regular expression

To make sure that a regular expression works as intended, we can write tests for it. These are tests for RE_API_SIGNATURE:

assert.deepEqual(
  getCaptures(`get Map.prototype.size`),
  {
    prefix: 'get ',
    name: 'Map.prototype.size',
  }
);
assert.deepEqual(
  getCaptures(`new Array(len = 0)`),
  {
    prefix: 'new ',
    name: 'Array',
  }
);
assert.deepEqual(
  getCaptures(`Array.prototype.push(...items)`),
  {
    prefix: undefined,
    name: 'Array.prototype.push',
  }
);
assert.deepEqual(
  getCaptures(`Map.prototype[Symbol.iterator]()`),
  {
    prefix: undefined,
    name: 'Map.prototype[Symbol.iterator]',
  }
);

function getCaptures(apiSignature) {
  const match = RE_API_SIGNATURE.exec(apiSignature);
  // Spread so that the result does not have a null prototype
  // and is easier to compare.
  return {...match.groups};
}

46.19.6 Tip: Mention examples in your documentation

Seeing strings that match, helps with understanding what a regular expression is supposed to do:

/**
 * Matches API signatures – e.g.:
 * ```
 * `get Map.prototype.size`
 * `new Array(len = 0)`
 * `Array.prototype.push(...items)`
 * `Map.prototype[Symbol.iterator]()`
 * ```
 */
const RE_API_SIGNATURE = regex`
  ···
`;

Some documentation tools let us refer to unit tests in doc comments and show their code in the documentation. That’s a good alternative to what we have done above.

46.19.7 Bonus tip: Use interpolation to reuse patterns

The Regex+ library lets us interpolate regular expression fragments (“patterns”), which helps with reuse. The following example defines a simple markup syntax that is reminiscent of HTML:

import { pattern, regex } from 'regex';

const LABEL = pattern`[a-z\-]+`;
const ARGS = pattern`
  (?<args>
    \x20+
    ${LABEL}
  )*
`;
const NAME = pattern`
  (?<name> ${LABEL} )
`;

const TAG = regex`
  (?<openingTag>
    \[
    \x20*
    ${NAME}
    ${ARGS}
    \x20*
    \]
  )
  |
  (?<singletonTag>
    \[
    \x20*
    ${NAME}
    ${ARGS}
    \x20*
    / \]
  )
`;

assert.deepEqual(
  TAG.exec('[pre js line-numbers]').groups,
  {
    openingTag: '[pre js line-numbers]',
    name: 'pre',
    args: ' line-numbers',
    singletonTag: undefined,
    __proto__: null,
  }
);

assert.deepEqual(
  TAG.exec('[hr /]').groups,
  {
    openingTag: undefined,
    name: 'hr',
    args: undefined,
    singletonTag: '[hr /]',
    __proto__: null,
  }
);

The regular expression TAG uses the regular expression fragments NAME and ARGS twice – which reduces redundancy.

46.19.8 Bonus tip: insignificant whitespace without a library

With the following trick, we don’t need a library to write a regular expression with insignificant whitespace:

const RE_API_SIGNATURE = new RegExp(
  String.raw`
    ^
    (?<prefix>
      new \x20
      |
      get \x20
    )?
    (?<name>
      [
        A-Z a-z 0-9 _
        .
        \[ \]
      ]+
    )
  `.replaceAll(/\s+/g, ''), // (A)
  'v'
);
assert.equal(
  String(RE_API_SIGNATURE),
  String.raw`/^(?<prefix>new\x20|get\x20)?(?<name>[A-Za-z0-9_.\[\]]+)/v`
);

How does this code work?

String.raw enables two things:
- We don’t have to escape regular expression backslashes for this kind of string literal.
- The regular expression can span multiple lines.
.replaceAll() removes all whitespace (spaces, tabs, line breaks, etc.) so that the end result looks almost like the initial version of the regular expression. There is one difference, though: Since literal spaces are removed, we have to find a different way to specify that there is a space after new and after get. One option is the hex escape \x20: hexadecimal 20 (decimal 32) is the code point SPACE.

We can even emulate inline comments like this:

// Template tag function
const cmt = () => '';
const RE = new RegExp(
  String.raw`
    a+ ${cmt`one or more as`}
  `.replaceAll(/\s+/g, ''),
  'v'
);
assert.equal(
  String(RE), '/a+/v'
);

Alas, it’s more syntactically noisy than I’d like.

46.19.9 Conclusion: This is how regular expressions are meant to be written

One reason why many people don’t like regular expressions is that they find them difficult to read. However, that is much less of a problem with insignificant whitespace and comments. I’d argue that is the proper way of writing regular expressions: Think what JavaScript code would look like if we had to write it without whitespace and comments.

46.20 Quick reference: regular expression functionality

46.20.1 Summary: `.global` (`/g`) and `.sticky` (`/y`)

The following two methods are completely unaffected by /g and /y:

String.prototype.search()
String.prototype.split()

This table explains how the remaining regular-expression-related methods are affected by these two flags (if neither /g nor /y are there, regExp.lastIndex is always ignored):

	Flags	Honors `.lastIndex`?	Updates `.lastIndex`?
`s.match`	`/y`	✔	✔
	`/g /gy`	✘	0
`s.matchAll`	`/g /gy`	✔	✘
`r.exec`	`/g /y /gy`	✔	✔
`s.replace`	`/y`	✔	✔
	`/g /gy`	✘	0
`s.replaceAll`	`/g /gy`	✘	0
`r.test`	`/g /y /gy`	✔	✔

A longer overview

I posted a longer table online that was created via a Node.js script.

46.20.2 `String.prototype.*`: matching and searching

String.prototype.match(regExpOrString)
ES3 | /y honors and updates .lastIndex | /g /gy ignore and reset .lastIndex
- (1 of 3) regExpOrString is a string.
  
  If regExpOrString is a string, it defines a pattern for a regular expression without /g (think parameter of new RegExp()). That regular expression is used as explained in the next list item.
- (2 of 3) regExpOrString is a RegExp without /g.
```
match(
  regExpOrString: string | RegExp
): null | RegExpMatchArray

interface RegExpMatchArray extends Array<string> {
  index: number;
  input: string;
  groups: undefined | {
    [key: string]: string
  };
}
```
  If regExpOrString is a regular expression with flag /g not set, then .match() returns the first match for regExpOrString within the string. Or null if there is no match.
  - Numbered capture groups become Array elements (which is why RegExpMatchArray extends Array).
  - Named capture groups ^ES2018 become properties of .groups.
  Examples:
```
> 'ababb'.match(/a(b+)/)
{ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: undefined }
> 'ababb'.match(/a(?<bs>b+)/)
{ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: { bs: 'b' } }
> 'abab'.match(/x/)
null
```
- (3 of 3) regExpOrString is RegExp with /g.
```
match(
  regExpOrString: RegExp
): null | Array<string>
```
  If flag /g of regExpOrString is set, .match() returns either an Array with all matches or null if there was no match.
```
> 'ababb'.match(/a(b+)/g)
[ 'ab', 'abb' ]
> 'ababb'.match(/a(?<bs>b+)/g)
[ 'ab', 'abb' ]
> 'abab'.match(/x/g)
null
```

String.prototype.matchAll(regExp)
ES2020 | /g /gy honor and preserve .lastIndex

matchAll(regexp: RegExp): Iterator<RegExpExecArray>
interface RegExpMatchArray extends Array<string> {
  index: number;
  input: string;
  groups: undefined | {
    [key: string]: string
  };
}

Throws an exception if flag /g is not set.
Returns an iterator over zero or more matches. Per match:
- Numbered capture groups become Array elements (which is why RegExpMatchArray extends Array).
- Named capture groups ^ES2018 become properties of .groups.

Example:

> 'yes'.matchAll(/(y|s)/gv).toArray()
[
  { 0: 'y', 1: 'y', index: 0, input: 'yes', groups: undefined },
  { 0: 's', 1: 's', index: 2, input: 'yes', groups: undefined },
]

String.prototype.search(regExpOrString)
ES3 | Ignores .lastIndex

Returns the index at which regExpOrString occurs within the string. If regExpOrString is a string, it is used to create a regular expression (think parameter of new RegExp()).
```
> 'a2b'.search(/[0-9]/)
1
> 'a2b'.search('[0-9]')
1
```

46.20.3 `String.prototype.*`: splitting and replacing

String.prototype.split(separator, limit?)
ES3 | Ignores .lastIndex
```
split(separator: string | RegExp, limit?: number): Array<string>
```
The separator can be a string (which is interpreted as plain text, not as a regular expression pattern) or a regular expression.

Examples:
```
// Splitting with a string
assert.deepEqual(
  'a.b.c'.split('.'),
  [ 'a', 'b', 'c' ]
);

// Splitting with a regular expression
assert.deepEqual(
  'a x:yyy b'.split(/x+:y+/),
  [ 'a ', ' b' ]
);

// Group captures appear in the result
assert.deepEqual(
  'a x:yyy b'.split(/(x+):(y+)/),
  [ 'a ', 'x', 'yyy', ' b' ]
);
```
If we want the separators to be part of the returned string fragments, we can use a regular expression with a lookbehind assertion or a lookahead assertion:
```
> 'a: b: c'.split(/(?<=:) */)
[ 'a:', 'b:', 'c' ]
> 'a :b :c'.split(/ *(?=:)/)
[ 'a', ':b', ':c' ]
```
Pitfall: .split('') splits into JavaScript characters, but we usually want to split into grapheme clusters or at least Unicode code points. Therefore, it’s better to use Array.from() or Intl.Segmenter for splitting. For more information, see “Atoms of text: code points, JavaScript characters, grapheme clusters” (§22.7).

String.prototype.replace(searchValue, replaceValue)
ES3 | /y honors and updates .lastIndex | /g /gy ignore and reset .lastIndex

For more information on this method, see its section earlier in this chapter.

(1 of 2) searchValue is string or RegExp without /g.

replace(
  searchValue: string | RegExp,
  replaceValue: string | (...args: any[]) => string
): string

Works similarly to .replaceAll(), but only replaces the first occurrence:

> 'x.x.'.replace('.', '#') // interpreted literally
'x#x.'
> 'x.x.'.replace(/./, '#')
'#.x.'

(1 of 2) searchValue is RegExp with /g.

replace(
  searchValue: RegExp,
  replaceValue: string | (...args: any[]) => string
): string

Works exactly like .replaceAll():

> 'x.x.'.replace(/./g, '#')
'####'
> 'x.x.'.replace(/\./g, '#')
'x#x#'

String.prototype.replaceAll(searchValue, replaceValue)
ES2021 | /g /gy ignore and reset .lastIndex
- (1 of 2) replaceValue is a string.
```
replaceAll(
  searchValue: string | RegExp,
  replaceValue: string
): string
```
  Replaces all matches of searchValue with replaceValue. If searchValue is a regular expression without flag /g, a TypeError is thrown.
```
> 'x.x.'.replaceAll('.', '#') // interpreted literally
'x#x#'
> 'x.x.'.replaceAll(/./g, '#')
'####'
> 'x.x.'.replaceAll(/./, '#')
TypeError: String.prototype.replaceAll called with
a non-global RegExp argument
```
  Special characters in replaceValue are:
  - $$: becomes $
  - $n: becomes the capture of numbered group n (alas, $0 stands for the string '$0', it does not refer to the complete match)
  - $&: becomes the complete match
  - $`: becomes everything before the match
  - $': becomes everything after the match
  - $<name> becomes the capture of named group name
- (2 of 2) replaceValue is a function.
```
replaceAll(
  searchValue: string | RegExp,
  replaceValue: (...args: any[]) => string
): string
```
  If the second parameter is a function, occurrences are replaced with the strings it returns. Its parameters args are:
  - matched: string. The complete match
  - g1: string|undefined. The capture of numbered group 1
  - g2: string|undefined. The capture of numbered group 2
  - (Etc.)
  - offset: number. Where was the match found in the input string?
  - input: string. The whole input string
```
const regexp = /([0-9]{2})\.([0-9]{4})/g;
const replacer = (all, month, year) => `|${year}-${month}|`;
assert.equal(
  'a 12.1995 b'.replaceAll(regexp, replacer),
  'a |1995-12| b'
);
```
  Named capture groups (ES2018) are supported, too. If there are any, an argument is added at the end with an object whose properties contain the captures:
```
const regexp = /(?<month>[0-9]{2})\.(?<year>[0-9]{4})/g;
const replacer = (...args) => {
  const groups = args.at(-1);
  return `|${groups.year}-${groups.month}|`;
};
assert.equal(
  'a 12.1995 b'.replaceAll(regexp, replacer),
  'a |1995-12| b'
);
```

46.20.4 `RegExp.prototype.*`

RegExp.prototype.test(string)
ES3 | /g /y /gy honor and update .lastIndex
```
test(string: string): boolean
```
Returns true if the receiver matches string:
```
> /^# /.test('# comment')
true
> /^# /.test('#comment')
false
> /^# /.test('abc')
false
```
Pitfall: Don’t use this method with a regular expression that has flag /g. Then regExp.test() starts matching at regExp.lastIndex and also updates that property.
RegExp.prototype.exec(string)
ES3 | /g /y /gy honor and update .lastIndex
- (1 of 2) Receiver is a RegExp without /g.
  
  Without flag /g, regExp.exec(string) works like string.match(regExp) – it returns a single match object.
- (2 of 2) Receiver is a RegExp with /g.
```
exec(string: string): RegExpExecArray | null

interface RegExpExecArray extends Array<string> {
  index: number;
  input: string;
  groups: undefined | {
    [key: string]: string
  };
}
```
  If regExp has flag /g then regExp.exec(str) returns an object for the first match starting at regExp.lastIndex – or null if it can’t find a match. It also updates regExp.lastIndex so that it points to the index after the match.
  - Numbered capture groups become Array elements (which is why RegExpExecArray extends Array).
  - Named capture groups ^ES2018 become properties of .groups.
  Examples:
```
> const regExp = /(a+)b/g, str = 'ab aab';

> regExp.exec(str)
{0: 'ab', 1: 'a', index: 0, input: 'ab aab', groups: undefined}
> regExp.lastIndex
2

> regExp.exec(str)
{0: 'aab', 1: 'aa', index: 3, input: 'ab aab', groups: undefined}
> regExp.lastIndex
6

> regExp.exec(str)
null
> regExp.lastIndex
0
```