MATLAB Programming Fundamentals - MathWorks

Tokens in Regular Expressions

In this section... “Introduction” on page 2-61 “Multiple Tokens” on page 2-64 “Unmatched Tokens” on page 2-65 “Tokens in Replacement Text” on page 2-66 “Named Capture” on page 2-66

Introduction

Parentheses used in a regular expression not only group elements of that expression together, but also designate any matches found for that group as tokens. You can use tokens to match other parts of the same text. One advantage of using tokens is that they remember what they matched, so you can recall and reuse matched text in the process of searching or replacing.

Each token in the expression is assigned a number, starting from 1, going from left to right. To make a reference to a token later in the expression, refer to it using a backslash followed by the token number. For example, when referencing a token generated by the third set of parentheses in the expression, use \3.

As a simple example, if you wanted to search for identical sequential letters in a character array, you could capture the first letter as a token and then search for a matching character immediately afterwards. In the expression shown below, the (\S) phrase creates a token whenever regexp matches any nonwhitespace character in the character array. The second part of the expression, '\1', looks for a second instance of the same character immediately following the first.

poe = ['While I nodded, nearly napping, ' ... 'suddenly there came a tapping,'];

[mat,tok,ext] = regexp(poe, '(\S)\1', 'match', ... 'tokens', 'tokenExtents'); mat

mat =

1×4 cell array

Tokens in Regular Expressions

MATLAB Programming Fundamentals - MathWorks

Get our desktop app

Company

Features

Documentation

Resources