Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
3.3.0
-
None
-
None
Description
The SPARQL implemenation of the KiWi triple store seams to have issues with the evaluation of OPTIONAL segments of SPARQL queries. In the following test data and test queries are provided.
Data
<urn:test.org:place.1> rdf:type schema:Palce ; schema:geo <urn:test.org:geo.1> ; schema:name "Place 1" . <urn:test.org:geo.1> rdf:type schema:GeoCoordinates ; schema:latitude "16"^^xsd:double ; schema:longitude "17"^^xsd:double ; schema:elevation "123"^^xsd:int . <urn:test.org:place.2> rdf:type schema:Palce ; schema:geo <urn:test.org:geo.2> ; schema:name "Place 2" . <urn:test.org:geo.2> rdf:type schema:GeoCoordinates ; schema:latitude "15"^^xsd:double ; schema:longitude "16"^^xsd:double ; schema:elevation "99"^^xsd:int . <urn:test.org:place.3> rdf:type schema:Palce ; schema:geo <urn:test.org:geo.3> ; schema:name "Place 3" . <urn:test.org:geo.3> rdf:type schema:GeoCoordinates ; schema:latitude "15"^^xsd:double ; schema:longitude "17"^^xsd:double . <urn:test.org:place.4> rdf:type schema:Palce ; schema:geo <urn:test.org:geo.4> ; schema:name "Place 4" . <urn:test.org:geo.4> rdf:type schema:GeoCoordinates ; schema:longitude "17"^^xsd:double ; schema:elevation "123"^^xsd:int .
Important is that `geo.1` and `geo.2` do have all latitude, longitude and elevation defined. `geo.3` has no elevation and `geo.4` is missing the latitude to simulate invalid geo coordinate data.
Test Case 1
The following query using an OPTIONAL graph pattern including `schema:latitude` and `schema:longitude`. This assumes a user just want lat/long values of locations that do define both.
PREFIX schema: <http://schema.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?entity schema:geo ?location OPTIONAL { ?location schema:latitude ?lat . ?location schema:longitude ?long . } }
translate to the Algebra
(base <http://example/base/> (prefix ((schema: <http://schema.org/>) (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)) (leftjoin (bgp (triple ?entity schema:geo ?location)) (bgp (triple ?location schema:latitude ?lat) (triple ?location schema:longitude ?long) ))))
The expected result are
entity,location,lat,long
urn:test.org:place.1,urn:test.org:geo.1,16,17
urn:test.org:place.2,urn:test.org:geo.2,15,16
urn:test.org:place.3,urn:test.org:geo.3,15,17
urn:test.org:place.4,urn:test.org:geo.4,,
All four locations are expected in the result set as the `OPTIONAL` graph pattern is translated to a `leftjoin` with `triple ?entity schema:geo ?location`.
However for `geo.4` no value is expected for `?lat` AND `long` as this resource only defines a longitude and therefore does not match
(bgp
(triple ?location schema:latitude ?lat)
(triple ?location schema:longitude ?long)
)
Marmotta responses with
entity,location,lat,long
urn:test.org:place.1,urn:test.org:geo.1,16,17
urn:test.org:place.2,urn:test.org:geo.2,15,16
urn:test.org:place.3,urn:test.org:geo.3,15,17
urn:test.org:place.4,urn:test.org:geo.4,,17
Note that the longitude is returned for the resource `geo.4`
Test Case 2
As a variation we now also include the `schema:elevation` in the OPTIONAL graph pattern.
PREFIX schema: <http://schema.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?entity schema:geo ?location OPTIONAL { ?location schema:latitude ?lat . ?location schema:longitude ?long . ?location schema:elevation ?alt . } }
This query translates to the following algebra
(base <http://example/base/> (prefix ((schema: <http://schema.org/>) (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)) (leftjoin (bgp (triple ?entity schema:geo ?location)) (bgp (triple ?location schema:latitude ?lat) (triple ?location schema:longitude ?long) (triple ?location schema:elevation ?alt) ))))
The expected result would have 4 result rows where `lat`, `long` and `alt` values are only provided for `geo.1` and `geo.2`.
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,16,17,123
urn:test.org:place.2,urn:test.org:geo.2,15,16,99
urn:test.org:place.3,urn:test.org:geo.3,,,
urn:test.org:place.4,urn:test.org:geo.4,,,
With this query Marmotta behaves very strange as the results depend on the ordering of the tripple patterns in the `OPTIONAL` graph pattern. I will not include all variations but just provide two examples:
OPTIONAL {
?location schema:latitude ?lat .
?location schema:longitude ?long .
?location schema:elevation ?alt .
}
gives
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,1.6E1,1.7E1,123
urn:test.org:place.2,urn:test.org:geo.2,1.5E1,1.6E1,99
urn:test.org:place.4,urn:test.org:geo.4,,1.7E1,123
while
OPTIONAL {
?location schema:longitude ?long .
?location schema:latitude ?lat .
?location schema:elevation ?alt .
}
gives
entity,location,long,lat,alt
urn:test.org:place.1,urn:test.org:geo.1,1.7E1,1.6E1,123
urn:test.org:place.2,urn:test.org:geo.2,1.6E1,1.5E1,99
This behavior further indicates that `OPTIONAL` are wrongly processed.
Test Case 3
Modifying the query to
PREFIX schema: <http://schema.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?entity schema:geo ?location OPTIONAL { ?location schema:latitude ?lat . ?location schema:longitude ?long . } OPTIONAL { ?location schema:elevation ?alt . } }
results in a similar result to Test Case 1 where we have 4 results, but for `geo.4` we do get the unexpected value for `?long`.
Test Case 4
This test case assumes that the user requires `lat` and `long` and optionally wants the `alt` but only for resources that do have a valid location.
PREFIX schema: <http://schema.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?entity schema:geo ?location OPTIONAL { ?location schema:latitude ?lat . ?location schema:longitude ?long . OPTIONAL { ?location schema:elevation ?alt . } } }
This translates to the following algebra
(base <http://example/base/> (prefix ((schema: <http://schema.org/>) (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)) (leftjoin (bgp (triple ?entity schema:geo ?location)) (leftjoin (bgp (triple ?location schema:latitude ?lat) (triple ?location schema:longitude ?long) ) (bgp (triple ?location schema:elevation ?alt))))))
So `lat` and `long` values are `leftjoin` with the `alt`. Than the result is in an other `leftjoin` with the results of `?entity schema:geo ?location`. Because expected results are as follows
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,16,17,123
urn:test.org:place.2,urn:test.org:geo.2,15,16,99
urn:test.org:place.3,urn:test.org:geo.3,,,
urn:test.org:place.4,urn:test.org:geo.4,,,
Marmotta however returns
entity,location,lat,long,alt
urn:test.org:place.1,urn:test.org:geo.1,16,17,123
urn:test.org:place.2,urn:test.org:geo.2,15,16,99
urn:test.org:place.3,urn:test.org:geo.3,15,17,
urn:test.org:place.4,urn:test.org:geo.4,,17,123
All test cases show that OPTIONAL query segments are not correctly evaluated by the SPARQL implementation of the KiWi triple store.