Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1462

RDF/XML parsing fails on newer/provisional/private URI schemes in base URI

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Jena 3.3.0, Jena 3.4.0, Jena 3.5.0, Jena 3.6.0
    • Fix Version/s: Jena 3.7.0
    • Component/s: ARQ, RDF/XML
    • Labels:
      None
    • Environment:

      Description

      RIOT parsing RDF/XML with a base URI different from http/https/file, such as ssh://, fails.

      See https://github.com/stain/jena-test-unregistered-iana for some tests I came up with.

      Tests fail both for xml:base or if the base URI is provided to RDFDataMgr, but not if the URI is full inside the RDF/XML.

      org.apache.jena.riot.RiotException: [line: 5, col: 40] {E214} Resolving against bad URI <ssh://example.com/nested/>: <foo.txt>
      	at org.apache.jena.riot.TestParseURISchemeBases.sshBaseRDF(TestParseURISchemeBases.java:336)
      

      This error message comes from ERR_RESOLVING_AGAINST_MALFORMED_BASE - for some reason the warning becomes an error as the IRI Factory used for creating the Base IRI within the RDF/XML parser is a bit too strict.

      However I could not find anything in the specs:

      that says "foreign" URI schemes should not be permitted. Anyway Jena's IANA list is probably out of date, as my tests shown.

      This was initially detected in TAVERNA-1027 which tries to parse an RDF/XML with the app:// URI scheme , which is not registered with IANA https://www.iana.org/assignments/uri-schemes according to https://tools.ietf.org/html/bcp35

      However, testing Jena with other permanent and provisional schemes from the registry, such as example://, ssh:// or a conformant private scheme with a domain-based name org.apache.jena.test:// also give the same error.

      IMHO they should all be understood in the same way as when parsing the Turtle examples, which don't fail.

      I could trace this back to Jena 3.3.0, so I suspect this was introduced with JENA-1306. With versions before that all my tests *) work.

      I'll raise a pull request with the junit tests, but have not been able to find a good way to fix it.

      *) There's a separate issue that hostnames in file://example.com/etc/passwd style URIs also seem to be misparsed in RDF/XML into file:///example.com/etc/passwd , reported separately as JENA-1463, that goes back till 3.0.1.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andy Andy Seaborne
                Reporter:
                stain Stian Soiland-Reyes
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: