Uploaded image for project: 'Marmotta (Retired)'
  1. Marmotta (Retired)
  2. MARMOTTA-603

SPARQL OPTIONAL issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.3.0
    • None
    • KiWi Triple Store
    • None

    Description

      The SPARQL implemenation of the KiWi triple store seams to have issues with the evaluation of OPTIONAL segments of SPARQL queries. In the following test data and test queries are provided.

      Data

      	<urn:test.org:place.1> rdf:type schema:Palce ;
      		schema:geo <urn:test.org:geo.1> ;
      		schema:name "Place 1" .
      
      	<urn:test.org:geo.1> rdf:type schema:GeoCoordinates ;
      		schema:latitude "16"^^xsd:double ;
      		schema:longitude "17"^^xsd:double ;
      		schema:elevation "123"^^xsd:int .
      
      	<urn:test.org:place.2> rdf:type schema:Palce ;
      		schema:geo <urn:test.org:geo.2> ;
      		schema:name "Place 2" .
      
      	<urn:test.org:geo.2> rdf:type schema:GeoCoordinates ;
      		schema:latitude "15"^^xsd:double ;
      		schema:longitude "16"^^xsd:double ;
      		schema:elevation "99"^^xsd:int .
      
      	<urn:test.org:place.3> rdf:type schema:Palce ;
      		schema:geo <urn:test.org:geo.3> ;
      		schema:name "Place 3" .
      
      	<urn:test.org:geo.3> rdf:type schema:GeoCoordinates ;
      		schema:latitude "15"^^xsd:double ;
      		schema:longitude "17"^^xsd:double .
      
      	<urn:test.org:place.4> rdf:type schema:Palce ;
      		schema:geo <urn:test.org:geo.4> ;
      		schema:name "Place 4" .
      
      	<urn:test.org:geo.4> rdf:type schema:GeoCoordinates ;
      		schema:longitude "17"^^xsd:double ;
      		schema:elevation "123"^^xsd:int .
      

      Important is that `geo.1` and `geo.2` do have all latitude, longitude and elevation defined. `geo.3` has no elevation and `geo.4` is missing the latitude to simulate invalid geo coordinate data.

      Test Case 1

      The following query using an OPTIONAL graph pattern including `schema:latitude` and `schema:longitude`. This assumes a user just want lat/long values of locations that do define both.

          PREFIX schema: <http://schema.org/>
          PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
          SELECT * WHERE {
              ?entity schema:geo ?location
              OPTIONAL {
                  ?location schema:latitude ?lat .
                  ?location    schema:longitude ?long .
              }
          }
      

      translate to the Algebra

          (base <http://example/base/>
              (prefix ((schema: <http://schema.org/>)
                      (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
                  (leftjoin
                  (bgp (triple ?entity schema:geo ?location))
                  (bgp
                      (triple ?location schema:latitude ?lat)
                      (triple ?location schema:longitude ?long)
                   ))))
      

      The expected result are

          entity,location,lat,long
          urn:test.org:place.1,urn:test.org:geo.1,16,17
          urn:test.org:place.2,urn:test.org:geo.2,15,16
          urn:test.org:place.3,urn:test.org:geo.3,15,17
          urn:test.org:place.4,urn:test.org:geo.4,,
      

      All four locations are expected in the result set as the `OPTIONAL` graph pattern is translated to a `leftjoin` with `triple ?entity schema:geo ?location`.

      However for `geo.4` no value is expected for `?lat` AND `long` as this resource only defines a longitude and therefore does not match

          (bgp
              (triple ?location schema:latitude ?lat)
              (triple ?location schema:longitude ?long)
          )
      

      Marmotta responses with

          entity,location,lat,long
          urn:test.org:place.1,urn:test.org:geo.1,16,17
          urn:test.org:place.2,urn:test.org:geo.2,15,16
          urn:test.org:place.3,urn:test.org:geo.3,15,17
          urn:test.org:place.4,urn:test.org:geo.4,,17
      

      Note that the longitude is returned for the resource `geo.4`

      Test Case 2

      As a variation we now also include the `schema:elevation` in the OPTIONAL graph pattern.

          PREFIX schema: <http://schema.org/>
          PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
          SELECT * WHERE {
              ?entity schema:geo ?location
              OPTIONAL {
            	    ?location schema:latitude ?lat .
                  ?location schema:longitude ?long .
                  ?location schema:elevation ?alt .
              }
          }
      

      This query translates to the following algebra

          (base <http://example/base/>
              (prefix ((schema: <http://schema.org/>)
                         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
                  (leftjoin
                  (bgp (triple ?entity schema:geo ?location))
                  (bgp
                      (triple ?location schema:latitude ?lat)
                      (triple ?location schema:longitude ?long)
                      (triple ?location schema:elevation ?alt)
                  ))))
      

      The expected result would have 4 result rows where `lat`, `long` and `alt` values are only provided for `geo.1` and `geo.2`.

          entity,location,lat,long,alt
          urn:test.org:place.1,urn:test.org:geo.1,16,17,123
          urn:test.org:place.2,urn:test.org:geo.2,15,16,99
          urn:test.org:place.3,urn:test.org:geo.3,,,
          urn:test.org:place.4,urn:test.org:geo.4,,,
      

      With this query Marmotta behaves very strange as the results depend on the ordering of the tripple patterns in the `OPTIONAL` graph pattern. I will not include all variations but just provide two examples:

              OPTIONAL {
            	    ?location schema:latitude ?lat .
                  ?location schema:longitude ?long .
                  ?location schema:elevation ?alt .
              }
      

      gives

          entity,location,lat,long,alt
          urn:test.org:place.1,urn:test.org:geo.1,1.6E1,1.7E1,123
          urn:test.org:place.2,urn:test.org:geo.2,1.5E1,1.6E1,99
          urn:test.org:place.4,urn:test.org:geo.4,,1.7E1,123
      

      while

              OPTIONAL {
                  ?location schema:longitude ?long .
            	    ?location schema:latitude ?lat .
                  ?location schema:elevation ?alt .
              }
      

      gives

          entity,location,long,lat,alt
          urn:test.org:place.1,urn:test.org:geo.1,1.7E1,1.6E1,123
          urn:test.org:place.2,urn:test.org:geo.2,1.6E1,1.5E1,99
      

      This behavior further indicates that `OPTIONAL` are wrongly processed.

      Test Case 3

      Modifying the query to

          PREFIX schema: <http://schema.org/>
          PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
          SELECT * WHERE {
              ?entity schema:geo ?location
              OPTIONAL {
            	    ?location schema:latitude ?lat .
                  ?location schema:longitude ?long .
              }
              OPTIONAL {
                  ?location schema:elevation ?alt .
              }
          }
      

      results in a similar result to Test Case 1 where we have 4 results, but for `geo.4` we do get the unexpected value for `?long`.

      Test Case 4

      This test case assumes that the user requires `lat` and `long` and optionally wants the `alt` but only for resources that do have a valid location.

          PREFIX schema: <http://schema.org/>
          PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
          SELECT * WHERE {
              ?entity schema:geo ?location
              OPTIONAL {
            	    ?location schema:latitude ?lat .
                  ?location schema:longitude ?long .
                  OPTIONAL {
                      ?location schema:elevation ?alt .
                  }
              }
          }
      

      This translates to the following algebra

          (base <http://example/base/>
              (prefix ((schema: <http://schema.org/>)
                         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
                  (leftjoin
                      (bgp (triple ?entity schema:geo ?location))
                      (leftjoin
                          (bgp
                              (triple ?location schema:latitude ?lat)
                              (triple ?location schema:longitude ?long)
                          )
                              (bgp (triple ?location schema:elevation ?alt))))))
      

      So `lat` and `long` values are `leftjoin` with the `alt`. Than the result is in an other `leftjoin` with the results of `?entity schema:geo ?location`. Because expected results are as follows

          entity,location,lat,long,alt
          urn:test.org:place.1,urn:test.org:geo.1,16,17,123
          urn:test.org:place.2,urn:test.org:geo.2,15,16,99
          urn:test.org:place.3,urn:test.org:geo.3,,,
          urn:test.org:place.4,urn:test.org:geo.4,,,
      

      Marmotta however returns

          entity,location,lat,long,alt
          urn:test.org:place.1,urn:test.org:geo.1,16,17,123
          urn:test.org:place.2,urn:test.org:geo.2,15,16,99
          urn:test.org:place.3,urn:test.org:geo.3,15,17,
          urn:test.org:place.4,urn:test.org:geo.4,,17,123
      

      All test cases show that OPTIONAL query segments are not correctly evaluated by the SPARQL implementation of the KiWi triple store.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: