Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-2061

Fuseki XML result serializer outputs characters that are illegal per XML spec

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 3.15.0
    • Jena 4.0.0
    • Fuseki
    • None
    • We confirmed the reported behavior in three environments:

      • CentOS 8 with OpenJDK 1.8.0_282
      • macOS 10.15 with OpenJDK 13.0.2
      • macOS 10.14 with Java 8 JDK

    Description

      Due to a mistake at our end, our application inserted a literal into the triple store that included ASCII character 0x001B (below represented as ESC):

      PREFIX oa: <http://www.w3.org/ns/oa#>
      PREFIX our: <http://example.org/>
      
      INSERT DATA {
          our:example oa:exact "foo ESC bar" .
      }
      

      While this was unintentional and I can't really think of a situation where inserting control characters is desirable, this is nevertheless allowed by the SPARQL and Turtle specifications. I think. Please correct me if I'm wrong. Regardless, Fuseki accepts this update request.

      When we subsequently retrieve the data through a SELECT query with the ACCEPT header set to application/sparql-results+xml, the XML includes this ESC character again:

      SELECT ?c WHERE { ?a ?b ?c . }
      
      <?xml version="1.0"?>
      <sparql xmlns="http://www.w3.org/2005/sparql-results#">
        <head>
          <variable name="c"/>
        </head>
        <results>
          <result>
            <binding name="c">
              <literal>foo ESC bar</literal>
            </binding>
          </result>
        </results>
      </sparql>
      

      This leads to errors when the result XML is parsed downstream.

      If we do a CONSTRUCT with application/rdf+xml, the Fuseki server returns a 400 Bad Request instead, which I have double-checked is due to the presence of the ESC character.

      Edit to add: the set of valid characters per the XML spec is defined here.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            andy Andy Seaborne
            jgonggrijp Julian Gonggrijp
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment