Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Jena 3.0.1
-
None
Description
Parsing a Turtle file served over http:// or https:// which has got no @base, and uses relative IRI references, wrongly uses the current directory in file:/// as a base.
The command line
stain@biggie:/tmp$ riot https://cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl
where that URL returns Content-Type: text/turtle;charset=utf-8 with the body:
@prefix : <#> . <> :a <#test> .
is wrongly parsed by the riot command line tool to be relative to the current directory:
<file:///tmp/https:/cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl> <file:///tmp/https:/cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl#a> <file:///tmp/https:/cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl#test> .
The expected output would be the same as supplying the same URI as a --base:
stain@biggie:/tmp$ riot --base https://cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl https://cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl <https://cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl> <https://cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl#a> <https://cdn.rawgit.com/stain/3d49908d790c5678faee302ba17f4a43/raw/bceb24ee6bfcaa60c4508779bc5f09ec367876f1/nobase.ttl#test> .
(except if a Content-Location header is provided, or HTTP redirection has been followed - in which case the result of that should be used as base)
Relevant specs:
https://www.w3.org/TR/turtle/#sec-iri-references
https://www.ietf.org/rfc/rfc3986 section 5.1 and 5.2