The proposal “RegExp Named Capture Groups” by Gorkem Yakin, Daniel Ehrenberg is at stage 4. This blog post explains what it has to offer.
Before we get to named capture groups, let’s take a look at numbered capture groups; to introduce the idea of capture groups.
Numbered capture groups enable you to take apart a string with a regular expression.
Successfully matching a regular expression against a string returns a match object matchObj
. Putting a fragment of the regular expression in parentheses turns that fragment into a capture group: the part of the string that it matches is stored in matchObj
.
Prior to this proposal, all capture groups were accessed by number: the capture group starting with the first parenthesis via matchObj[1]
, the capture group starting with the second parenthesis via matchObj[2]
, etc.
For example, the following code shows how numbered capture groups are used to extract year, month and day from a date in ISO format:
const RE_DATE = /([0-9]{4})-([0-9]{2})-([0-9]{2})/;
const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj[1]; // 1999
const month = matchObj[2]; // 12
const day = matchObj[3]; // 31
Referring to capture groups via numbers has several disadvantages:
All issues can be somewhat mitigated by defining constants for the numbers of the capture groups. However, capture groups are an all-around superior solution.
The proposed feature is about identifying capture groups via names:
(?<year>[0-9]{4})
Here we have tagged the previous capture group #1 with the name year
. The name must be a legal JavaScript identifier (think variable name or property name). After matching, you can access the captured string via matchObj.groups.year
.
The captured strings are not properties of matchObj
, because you don’t want them to clash with current or future properties created by the regular expression API.
Let’s rewrite the previous code so that it uses named capture groups:
const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj.groups.year; // 1999
const month = matchObj.groups.month; // 12
const day = matchObj.groups.day; // 31
Named capture groups also create indexed entries; as if they were numbered capture groups:
const year2 = matchObj[1]; // 1999
const month2 = matchObj[2]; // 12
const day2 = matchObj[3]; // 31
Destructuring can help with getting data out of the match object:
const {groups: {day, year}} = RE_DATE.exec('1999-12-31');
console.log(year); // 1999
console.log(day); // 31
Named capture groups have the following benefits:
You can freely mix numbered and named capture groups.
\k<name>
in a regular expression means: match the string that was previously matched by the named capture group name
. For example:
const RE_TWICE = /^(?<word>[a-z]+)!\k<word>$/;
RE_TWICE.test('abc!abc'); // true
RE_TWICE.test('abc!ab'); // false
The backreference syntax for numbered capture groups works for named capture groups, too:
const RE_TWICE = /^(?<word>[a-z]+)!\1$/;
RE_TWICE.test('abc!abc'); // true
RE_TWICE.test('abc!ab'); // false
You can freely mix both syntaxes:
const RE_TWICE = /^(?<word>[a-z]+)!\k<word>!\1$/;
RE_TWICE.test('abc!abc!abc'); // true
RE_TWICE.test('abc!abc!ab'); // false
replace()
and named capture groups The string method replace()
supports named capture groups in two ways.
First, you can mention their names in the replacement string:
const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
console.log('1999-12-31'.replace(RE_DATE,
'$<month>/$<day>/$<year>'));
// 12/31/1999
Second, each replacement function receives an additional parameter that holds an object with data captured via named groups. For example (line A):
const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
console.log('1999-12-31'.replace(
RE_DATE,
(g0,y,m,d,offset,input, {year, month, day}) => // (A)
month+'/'+day+'/'+year));
// 12/31/1999
These are the parameters of the callback in line A:
g0
contains the whole matched substring, '1999-12-31'
y
, m
, d
are matches for the numbered groups 1–3 (which are created via the named groups year
, month
, day
).offset
specifies where the match was found.input
contains the complete input string.year
, month
and day
. We use destructuring to access those properties.The following code shows another way of accessing the last argument:
console.log('1999-12-31'.replace(RE_DATE,
(...args) => {
const {year, month, day} = args[args.length-1];
return month+'/'+day+'/'+year;
}));
// 12/31/1999
We receive all arguments via the rest parameter args
. The last element of the Array args
is the object with the data from the named groups. We access it via the index args.length-1
.
If an optional named group does not match, its property is set to undefined
(but still exists):
const RE_OPT_A = /^(?<as>a+)?$/;
const matchObj = RE_OPT_A.exec('');
// We have a match:
console.log(matchObj[0] === ''); // true
// Group <as> didn’t match anything:
console.log(matchObj.groups.as === undefined); // true
// But property `as` exists:
console.log('as' in matchObj.groups); // true
transform-modern-regexp
by Dmitry Soshnikov supports named capture groups.The relevant V8 is not yet in Node.js (7.10.0). You can check via:
node -p process.versions.v8
In Chrome Canary (60.0+), you can enable named capture groups as follows. First, look up the path of the Chrome Canary binary via the about:
URL. Then start Canary like this (you only need the double quotes if the path contains a space):
$ alias canary='"/tmp/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary"'
$ canary --js-flags='--harmony-regexp-named-captures'