Chapter 16. PARSING AND STRING
EVALUATION
Tokenizing
Regular Expressions........................................................................................
Defining Regular Expressions......................................................................
Using Regular Expressions in PHP Scripts..............................................
Parsing is the act of breaking a whole into components, usually a sentence into words.
PHP must parse the code you write as a first step in turning a script into an HTML
document. There will come a time when you are faced with extracting or verifying data
collected in a string. This could be as simple as a tab-delimited list. It could be as
complicated as the string a browser uses to identify itself to a Web server. You may
choose to tokenize the string, breaking it into pieces. Or you may choose to apply a
regular expression. This chapter examines PHP's functions for parsing and string
evaluation.
Tokenizing
PHP allows for a simple model for tokenizing a string. Certain characters, of your choice,
are considered separators. Strings of characters between separators are considered tokens.
You may change the set of separators with each token you pull from a string, which is
handy for irregular strings—that is, ones that aren't simply comma-separated lists.
Listing 16.1 accepts a sentence and breaks it into words using the strtok function,
described in Chapter 9, "Data Functions." As far as the script is concerned, a word
is surrounded by a space, punctuation, or either end of the sentence. Single and double
quotes are left as part of the word.
Listing 16.1 Tokenizing a String