[JENA-1285] Have on Tokenizer token for strings. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Jena 3.3.0
Component/s: RIOT
Labels:
None

Description

The Tokenizer (TokenizerText) faithfully records what sort of string it has processed using different token types - STRING1, STRING2, LONG_STRING1, LONG_STRING2.

Sometimes it matters (N-Triples), sometimes it doesn't (Turtle).

Turtle rule for strings

N-Triples rule for strings

Instead of 4 tokens, (5 if you include the existing STRING token) it is proposed to use one token type STRING and record the actual string type seen separately.

This is make working with non-text formats simpler where there are strings without the concept of quotes, and any format that works with any string form.

The specific cases (e.g. N-Triples) can still test for the details of the string syntax seen but the token type is the conceptual "superclass" STRING type.

Attachments

Issue Links

links to

GitHub Pull Request #213

Activity

People

Assignee:: Andy Seaborne

Reporter:: Andy Seaborne

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Feb/17 10:54

Updated:: 10/May/17 15:35

Resolved:: 09/Feb/17 13:10