Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
The Tokenizer (TokenizerText) faithfully records what sort of string it has processed using different token types - STRING1, STRING2, LONG_STRING1, LONG_STRING2.
Sometimes it matters (N-Triples), sometimes it doesn't (Turtle).
Instead of 4 tokens, (5 if you include the existing STRING token) it is proposed to use one token type STRING and record the actual string type seen separately.
This is make working with non-text formats simpler where there are strings without the concept of quotes, and any format that works with any string form.
The specific cases (e.g. N-Triples) can still test for the details of the string syntax seen but the token type is the conceptual "superclass" STRING type.
Attachments
Issue Links
- links to