APPENDIX B ■ A SIMPLE PARSER
$reader = new \gi\parse\StringReader( $user_in );
$scanner = new \gi\parse\Scanner( $reader, $context );
while ( $scanner->nextToken() != \gi\parse\Scanner::EOF ) {
print $scanner->token();
print "\t{$scanner->char_no()}";
print "\t{$scanner->getTypeString()}\n";
}I initialize a Scanner object and then loop through the tokens in the given string by
repeatedly calling nextToken(). The token() method returns the current portion of the input
matched. char_no() tells me where I am in the string, and getTypeString() returns a string
version of the constant flag representing the current token. This is what the output should
look like:
$ 1 CHAR
input 6 WORD
7 WHITESPACE
equals 13 WORD
14 WHITESPACE
' 15 APOS
4 16 WORD
' 17 APOS
18 WHITESPACE
or 20 WORD
21 WHITESPACE
$ 22 CHAR
input 27 WORD
28 WHITESPACE
equals 34 WORD
35 WHITESPACE
' 36 APOS
four 40 WORD
' 41 APOS
I could, of course, match finer-grained tokens than this, but this is good enough for my purposes.
Breaking up the string is the easy part. How do I build up a grammar in code?
The Parser
One approach is to build a tree of Parser objects. Here is the abstract Parser class that I will be using:
namespace gi\parse;
abstract class Parser {
const GIP_RESPECTSPACE = 1;
protected $respectSpace = false;
protected static $debug = false;
protected $discard = false;
protected $name;
private static $count=0;
function __construct( $name=null, $options=null ) {
if ( is_null( $name ) ) {
self::$count++;
$this->name = get_class( $this )." (".self::$count.")";