Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Jena 3.3.0, Jena 3.4.0, Jena 3.5.0, Jena 3.6.0
-
None
-
Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.10.0-42-generic", arch: "amd64", family: "unix"Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenialApache Maven 3.3.9 Maven home: /usr/share/maven Java version: 1.8.0_151, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre Default locale: en_GB, platform encoding: UTF-8 OS name: "linux", version: "4.10.0-42-generic", arch: "amd64", family: "unix" Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial
Description
RIOT parsing RDF/XML with a base URI different from http/https/file, such as ssh://, fails.
See https://github.com/stain/jena-test-unregistered-iana for some tests I came up with.
Tests fail both for xml:base or if the base URI is provided to RDFDataMgr, but not if the URI is full inside the RDF/XML.
org.apache.jena.riot.RiotException: [line: 5, col: 40] {E214} Resolving against bad URI <ssh://example.com/nested/>: <foo.txt>
at org.apache.jena.riot.TestParseURISchemeBases.sshBaseRDF(TestParseURISchemeBases.java:336)
This error message comes from ERR_RESOLVING_AGAINST_MALFORMED_BASE - for some reason the warning becomes an error as the IRI Factory used for creating the Base IRI within the RDF/XML parser is a bit too strict.
However I could not find anything in the specs:
- https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/
- https://www.w3.org/TR/2009/REC-xmlbase-20090128/
- https://www.ietf.org/rfc/rfc3986
that says "foreign" URI schemes should not be permitted. Anyway Jena's IANA list is probably out of date, as my tests shown.
This was initially detected in TAVERNA-1027 which tries to parse an RDF/XML with the app:// URI scheme , which is not registered with IANA https://www.iana.org/assignments/uri-schemes according to https://tools.ietf.org/html/bcp35
However, testing Jena with other permanent and provisional schemes from the registry, such as example://, ssh:// or a conformant private scheme with a domain-based name org.apache.jena.test:// also give the same error.
IMHO they should all be understood in the same way as when parsing the Turtle examples, which don't fail.
I could trace this back to Jena 3.3.0, so I suspect this was introduced with JENA-1306. With versions before that all my tests *) work.
I'll raise a pull request with the junit tests, but have not been able to find a good way to fix it.
*) There's a separate issue that hostnames in file://example.com/etc/passwd style URIs also seem to be misparsed in RDF/XML into file:///example.com/etc/passwd , reported separately as JENA-1463, that goes back till 3.0.1.
Attachments
Issue Links
- breaks
-
TAVERNA-1027 COMBINE parsing fails with updated Jena - app URI scheme not supported
- Done
- is related to
-
JENA-1463 RDF/XML parsing of file://hostname/ base URI miswrites URI
- Closed
- relates to
-
JENA-1306 Provide detailed setup for RIOT parsing with a parser builder.
- Closed
- links to