Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-2188

Escape % in TokenizerText#fatal

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 4.2.0
    • Jena 4.3.0
    • RIOT
    • None

    Description

      The presence of "%" near to a syntax error might cause TokenizerText#fatal to throw an UnknownFormatConversionException instead of a RiotParseException. This happens because of the use of String#format without escaping "%". See the following example with an intended syntax error (additional " after lang-tag):

      import java.io.ByteArrayInputStream;
      import static java.nio.charset.StandardCharsets.UTF_8;
      import org.apache.jena.rdf.model.ModelFactory;
      import org.apache.jena.riot.Lang;
      import org.apache.jena.riot.RDFParserBuilder;
      import org.junit.jupiter.api.Test;
      
      public class TokenizerTextTest {
        @Test
        public void fatal() {
          RDFParserBuilder.create().source(new ByteArrayInputStream(
            "<http://example.org/s> <http://example.org/p> \"example\"@en-US\" <http://example.org/%D8-graph>"
            .getBytes(UTF_8))).lang(Lang.NQUADS).parse(ModelFactory.createDefaultModel());
        }
      }

      This causes:

      java.util.UnknownFormatConversionException: Conversion = 'D'
      	at java.base/java.util.Formatter$FormatSpecifier.conversion(Formatter.java:2839)
      	at java.base/java.util.Formatter$FormatSpecifier.<init>(Formatter.java:2865)
      	at java.base/java.util.Formatter.parse(Formatter.java:2713)
      	at java.base/java.util.Formatter.format(Formatter.java:2655)
      	at java.base/java.util.Formatter.format(Formatter.java:2609)
      	at java.base/java.lang.String.format(String.java:2897)
      	at org.apache.jena.riot.tokens.TokenizerText.fatal(TokenizerText.java:1347)
      	at org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:773)
      	at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
      	at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
      	at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
      	at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
      	at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:98)
      	at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:78)
      	at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:53)
      	at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
      	at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:181)
      	at org.apache.jena.riot.RDFParser.read(RDFParser.java:358)
      	at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:348)
      	at org.apache.jena.riot.RDFParser.parse(RDFParser.java:295)
      	at org.apache.jena.riot.RDFParser.parse(RDFParser.java:241)
      	at org.apache.jena.riot.RDFParser.parse(RDFParser.java:250)
      	at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:574)
      	at TokenizerTextTest.fatal(TokenizerTextTest.java:17)
      

      Attachments

        Issue Links

          Activity

            People

              andy Andy Seaborne
              jmkeil Jan Martin Keil
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: