-
Krzysztof Putyra authoredKrzysztof Putyra authored
Parsers and Tokenizer
Motivation and use cases
Parsing values for special QFQ columns starting from simple lists of key-value pairs to enhanced JSON strings.
Overview of classes
All classes are defined in the namespace IMATHUZH\Qfq\Core\Parser
.
StringTokenizer
This class provides a generator that iterates over a string and returns tokens bound by predefined delimiters. The delimiters are search in a smart way:
- delimiters escaped with a backslash are ignored in the search
- the parser can distinguish between escaping and escaped backslashes,
i.e. the colon (as a delimiter) is ignored in the string
ab\:cd
but not inab\\:cd
- a part of a string between quotes is treated as a plain text - all delimiters are ignored (and the quote characters are removed).
:,|
:
Examples with delimiters Input string | Resulting sequence of tokens |
---|---|
ab:cd,ef|gh |
'ab' 'cd' 'ef' 'gh' |
"ab:cd",ef\|gh |
'ab:cd' 'ef|gh' |
Usage
$tokenizer = new StringTokenizer(':,|');
foreach ($tokenizer->tokenized('ab:cd,ef\|gh') as list($token, $delimiter)) {
// $token is an instance of Token class:
// $token->value is a string representation of the token
// $token->isString is true if the token is a string (quotes were used)
// $token->empty() is true for a token generated only from whitespace characters
// $delimiter === null when the end of the string is reached
}
SimpleParser
This class parses a string into a list of tokens separated by delimiters.
Comparing to StringTokenizer
, the returned tokens literal values or special objects
the processing can be tweaked by options provided as an array in the second parameter.
Parameters key | Type | Meaning |
---|---|---|
OPTION_PARSE_NUMBERS |
bool | Convert tokens to numbers when possible |
OPTION_KEEP_SIGN |
bool | Creates an instance of SignedNumber if a number has an explicit sign |
OPTION_KEY_IS_VALUE |
bool | Keys with no values are assigned its name as the value |
OPTION_EMPTY |
any | The value used for empty tokens |
Note that the option OPTION_KEY_IS_VALUE
is not used by SimpleParser
but it is used
by derived classes.
Note: the option OPTION_KEEP_SIGN
is used by jwt
column, so that claims
exp
and nbf
can be specified either with absolute (no plus) or relative
(with a plus) timestamps.
Usage
$parser = new SimpleParser(":|");
// By default five special values are configured:
// 'null' -> null
// 'true', 'yes' -> true
// 'false', 'no' -> false
// More can be defined by updating $specialValues property:
$parser->specialValues['qfq'] = 'QFQ is great';
// This returns an array ['abc', 'efg', 123, true, 'QFQ is great']
$parser->parse("abc:efg|123|yes:qfq");
// The tokens can be iterated as follows
foreach($parser->iterate("abc:efg|123|yes") as $token) {
...
}
KVPairListParser
This class parses a list of key-value pairs into an associative array. It requires two arguments: the list separator and key-value separator.
Usage
// Default separators are , and :
$parser = new KVPairListParser("|", "=");
$parser->parse("a=43|b=false|xyz='a|b'");
// result: [ 'a' => 43, 'b' => false, 'xyz' => 'a|b' ]
foreach ($parser->iterate("a=43|b=false|xyz='a|b'") as $key => $value) {
...
}
MixedTypeParser
This parser understands both lists and dictionaries and both structures can be nested.
The constructor must be provided six delimiters in one string: list separator,
key-value separator, list delimiters (begin and end), and dictionary delimiters
(begin and end). The default value is ,:[]{}
. It is also possible to replace
the list and dictionary delimiters with spaces, in which case the parser will
ignore it. For instance
-
new MixedTypeParser(',:[]')
can parse nested lists, but not dictionaries (the string is padded) -
new MixedTypeParser(',: {}')
can parse nested dictionaries, but not lists
This parser can be seen as an extension to a JSON parser: strings does not have to be enclosed with quotes.
Usage
$parser = new MixedTypeParser(',:[]{}', [ /* options */ ]);
$parser->parse('[0, { a: 14, b: 16 }, abc]');
$parser->parseList('abc, [x, y, z], {a:15}, xyz');
$parser->parseDictionary('num:15, arr:[x, y, z], dict:{a:15}, str:xyz');
Note: there is no meaningful iterate()
method.