Skip to content
Snippets Groups Projects

Parsers and Tokenizer

Motivation and use cases

Parsing values for special QFQ columns starting from simple lists of key-value pairs to enhanced JSON strings.

Overview of classes

All classes are defined in the namespace IMATHUZH\Qfq\Core\Parser.

StringTokenizer

This class provides a generator that iterates over a string and returns tokens bound by predefined delimiters. The delimiters are search in a smart way:

  • delimiters escaped with a backslash are ignored in the search
  • the parser can distinguish between escaping and escaped backslashes, i.e. the colon (as a delimiter) is ignored in the string ab\:cd but not in ab\\:cd
  • a part of a string between quotes is treated as a plain text - all delimiters are ignored (and the quote characters are removed).

Examples with delimiters :,|:

Input string Resulting sequence of tokens
ab:cd,ef|gh 'ab' 'cd' 'ef' 'gh'
"ab:cd",ef\|gh 'ab:cd' 'ef|gh'

Usage

$tokenizer = new StringTokenizer(':,|');
foreach ($tokenizer->tokenized('ab:cd,ef\|gh') as list($token, $delimiter)) {
   // $token is an instance of Token class:
   //     $token->value    is a string representation of the token
   //     $token->isString is true if the token is a string (quotes were used)
   //     $token->empty()  is true for a token generated only from whitespace characters
   // $delimiter === null    when the end of the string is reached
}

SimpleParser

This class parses a string into a list of tokens separated by delimiters. Comparing to StringTokenizer, the returned tokens literal values or special objects the processing can be tweaked by options provided as an array in the second parameter.

Parameters key Type Meaning
OPTION_PARSE_NUMBERS bool Convert tokens to numbers when possible
OPTION_KEEP_SIGN bool Creates an instance of SignedNumber if a number has an explicit sign
OPTION_KEY_IS_VALUE bool Keys with no values are assigned its name as the value
OPTION_EMPTY any The value used for empty tokens

Note that the option OPTION_KEY_IS_VALUE is not used by SimpleParser but it is used by derived classes.

Note: the option OPTION_KEEP_SIGN is used by jwt column, so that claims exp and nbf can be specified either with absolute (no plus) or relative (with a plus) timestamps.

Usage

$parser = new SimpleParser(":|");
// By default five special values are configured:
//    'null' -> null
//    'true', 'yes' -> true
//    'false', 'no' -> false
// More can be defined by updating $specialValues property:
$parser->specialValues['qfq'] = 'QFQ is great';

// This returns an array ['abc', 'efg', 123, true, 'QFQ is great']
$parser->parse("abc:efg|123|yes:qfq");

// The tokens can be iterated as follows
foreach($parser->iterate("abc:efg|123|yes") as $token) {
   ...
}

KVPairListParser

This class parses a list of key-value pairs into an associative array. It requires two arguments: the list separator and key-value separator.

Usage

// Default separators are , and :
$parser = new KVPairListParser("|", "=");
$parser->parse("a=43|b=false|xyz='a|b'");
// result: [ 'a' => 43, 'b' => false, 'xyz' => 'a|b' ]

foreach ($parser->iterate("a=43|b=false|xyz='a|b'") as $key => $value) {
   ...
}

MixedTypeParser

This parser understands both lists and dictionaries and both structures can be nested. The constructor must be provided six delimiters in one string: list separator, key-value separator, list delimiters (begin and end), and dictionary delimiters (begin and end). The default value is ,:[]{}. It is also possible to replace the list and dictionary delimiters with spaces, in which case the parser will ignore it. For instance

  • new MixedTypeParser(',:[]') can parse nested lists, but not dictionaries (the string is padded)
  • new MixedTypeParser(',: {}') can parse nested dictionaries, but not lists

This parser can be seen as an extension to a JSON parser: strings does not have to be enclosed with quotes.

Usage

$parser = new MixedTypeParser(',:[]{}', [ /* options */ ]);
$parser->parse('[0, { a: 14, b: 16 }, abc]');
$parser->parseList('abc, [x, y, z], {a:15}, xyz');
$parser->parseDictionary('num:15, arr:[x, y, z], dict:{a:15}, str:xyz');

Note: there is no meaningful iterate() method.