replace()
and named capture groupsThis chapter explains proposal “RegExp Named Capture Groups” by Gorkem Yakin, Daniel Ehrenberg.
Before we get to named capture groups, let’s take a look at numbered capture groups; to introduce the idea of capture groups.
Numbered capture groups enable you to take apart a string with a regular expression.
Successfully matching a regular expression against a string returns a match object matchObj
. Putting a fragment of the regular expression in parentheses turns that fragment into a capture group: the part of the string that it matches is stored in matchObj
.
Prior to this proposal, all capture groups were accessed by number: the capture group starting with the first parenthesis via matchObj[1]
, the capture group starting with the second parenthesis via matchObj[2]
, etc.
For example, the following code shows how numbered capture groups are used to extract year, month and day from a date in ISO format:
const RE_DATE = /([0-9]{4})-([0-9]{2})-([0-9]{2})/;
const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj[1]; // 1999
const month = matchObj[2]; // 12
const day = matchObj[3]; // 31
Referring to capture groups via numbers has several disadvantages:
All issues can be somewhat mitigated by defining constants for the numbers of the capture groups. However, capture groups are an all-around superior solution.
The proposed feature is about identifying capture groups via names:
(?<year>[0-9]{4})
Here we have tagged the previous capture group #1 with the name year
. The name must be a legal JavaScript identifier (think variable name or property name). After matching, you can access the captured string via matchObj.groups.year
.
The captured strings are not properties of matchObj
, because you don’t want them to clash with current or future properties created by the regular expression API.
Let’s rewrite the previous code so that it uses named capture groups:
const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj.groups.year; // 1999
const month = matchObj.groups.month; // 12
const day = matchObj.groups.day; // 31
Named capture groups also create indexed entries; as if they were numbered capture groups:
const year2 = matchObj[1]; // 1999
const month2 = matchObj[2]; // 12
const day2 = matchObj[3]; // 31
Destructuring can help with getting data out of the match object:
const {groups: {day, year}} = RE_DATE.exec('1999-12-31');
console.log(year); // 1999
console.log(day); // 31
Named capture groups have the following benefits:
You can freely mix numbered and named capture groups.
\k<name>
in a regular expression means: match the string that was previously matched by the named capture group name
. For example:
const RE_TWICE = /^(?<word>[a-z]+)!\k<word>$/;
RE_TWICE.test('abc!abc'); // true
RE_TWICE.test('abc!ab'); // false
The backreference syntax for numbered capture groups works for named capture groups, too:
const RE_TWICE = /^(?<word>[a-z]+)!\1$/;
RE_TWICE.test('abc!abc'); // true
RE_TWICE.test('abc!ab'); // false
You can freely mix both syntaxes:
const RE_TWICE = /^(?<word>[a-z]+)!\k<word>!\1$/;
RE_TWICE.test('abc!abc!abc'); // true
RE_TWICE.test('abc!abc!ab'); // false
replace()
and named capture groupsThe string method replace()
supports named capture groups in two ways.
First, you can mention their names in the replacement string:
const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
console.log('1999-12-31'.replace(RE_DATE,
'$<month>/$<day>/$<year>'));
// 12/31/1999
Second, each replacement function receives an additional parameter that holds an object with data captured via named groups. For example (line A):
const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
console.log('1999-12-31'.replace(
RE_DATE,
(g0,y,m,d,offset,input, {year, month, day}) => // (A)
month+'/'+day+'/'+year));
// 12/31/1999
These are the parameters of the callback in line A:
g0
contains the whole matched substring, '1999-12-31'
y
, m
, d
are matches for the numbered groups 1–3 (which are created via the named groups year
, month
, day
).offset
specifies where the match was found.input
contains the complete input string.year
, month
and day
. We use destructuring to access those properties.The following code shows another way of accessing the last argument:
console.log('1999-12-31'.replace(RE_DATE,
(...args) => {
const {year, month, day} = args[args.length-1];
return month+'/'+day+'/'+year;
}));
// 12/31/1999
We receive all arguments via the rest parameter args
. The last element of the Array args
is the object with the data from the named groups. We access it via the index args.length-1
.
If an optional named group does not match, its property is set to undefined
(but still exists):
const RE_OPT_A = /^(?<as>a+)?$/;
const matchObj = RE_OPT_A.exec('');
// We have a match:
console.log(matchObj[0] === ''); // true
// Group <as> didn’t match anything:
console.log(matchObj.groups.as === undefined); // true
// But property `as` exists:
console.log('as' in matchObj.groups); // true
transform-modern-regexp
by Dmitry Soshnikov supports named capture groups.--harmony_regexp_named_captures
.You can check the version of V8 in your Node.js via:
node -p process.versions.v8
In Chrome Canary (60.0+), you can enable named capture groups as follows. First, look up the path of the Chrome Canary binary via the about:
URL. Then start Canary like this (you only need the double quotes if the path contains a space):
$ alias canary='"/tmp/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary"'
$ canary --js-flags='--harmony-regexp-named-captures'