Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1285

Have on Tokenizer token for strings.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • Jena 3.3.0
    • RIOT
    • None

    Description

      The Tokenizer (TokenizerText) faithfully records what sort of string it has processed using different token types - STRING1, STRING2, LONG_STRING1, LONG_STRING2.

      Sometimes it matters (N-Triples), sometimes it doesn't (Turtle).

      Turtle rule for strings

      N-Triples rule for strings

      Instead of 4 tokens, (5 if you include the existing STRING token) it is proposed to use one token type STRING and record the actual string type seen separately.

      This is make working with non-text formats simpler where there are strings without the concept of quotes, and any format that works with any string form.

      The specific cases (e.g. N-Triples) can still test for the details of the string syntax seen but the token type is the conceptual "superclass" STRING type.

      Attachments

        Issue Links

          Activity

            People

              andy Andy Seaborne
              andy Andy Seaborne
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: