Tokens in Regular Expressions
In this section...
“Introduction” on page 2-61
“Multiple Tokens” on page 2-64
“Unmatched Tokens” on page 2-65
“Tokens in Replacement Text” on page 2-66
“Named Capture” on page 2-66
Introduction
Parentheses used in a regular expression not only group elements of that expression
together, but also designate any matches found for that group as tokens. You can use
tokens to match other parts of the same text. One advantage of using tokens is that they
remember what they matched, so you can recall and reuse matched text in the process of
searching or replacing.
Each token in the expression is assigned a number, starting from 1, going from left to
right. To make a reference to a token later in the expression, refer to it using a backslash
followed by the token number. For example, when referencing a token generated by the
third set of parentheses in the expression, use \3.
As a simple example, if you wanted to search for identical sequential letters in a character
array, you could capture the first letter as a token and then search for a matching
character immediately afterwards. In the expression shown below, the (\S) phrase
creates a token whenever regexp matches any nonwhitespace character in the character
array. The second part of the expression, '\1', looks for a second instance of the same
character immediately following the first.
poe = ['While I nodded, nearly napping, ' ...
'suddenly there came a tapping,'];
[mat,tok,ext] = regexp(poe, '(\S)\1', 'match', ...
'tokens', 'tokenExtents');
mat
mat =
1×4 cell array
Tokens in Regular Expressions