Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Jena 2.12.0
-
None
Description
The following query demonstrates a query plan seen internally which is considered sub-optimal.
Consider the following query:
SELECT DISTINCT ?domainName { { ?uri ?p ?o } UNION { ?sub ?p ?uri FILTER(isIRI(?uri)) } BIND(str(?uri) as ?s) FILTER(STRSTARTS(?s, "http://")) BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName) }
Which ARQ optimises as follows:
(distinct (project (?domainName) (filter (strstarts ?s "http://") (extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/"))))) (union (bgp (triple ?uri ?p ?o)) (filter (isIRI ?uri) (bgp (triple ?sub ?p ?uri))))))))
Which makes the query engine do a lot of work because it computes the both the BIND expressions for lots of possible solutions that will then be rejected when for many of them it would only be necessary to compute the first simple BIND function.
It would be better if the query was planned as follows:
(distinct (project (?domainName) (extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/")))) (filter (strstarts ?s "http://") (extend (?s (str ?uri)) (union (bgp (triple ?uri ?p ?o)) (filter (isIRI ?uri) (bgp (triple ?sub ?p ?uri)))))))))
Essentially when we try to push a filter through an extend if we determine that we cannot push it through the extend we should see if we can split the extend instead thus resulting in a partial pushing.
Note that a user can re-write the original query to yield this plan if they make the second BIND a project expression like so:
SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName) { { ?uri ?p ?o } UNION { ?sub ?p ?uri FILTER(isIRI(?uri)) } BIND(str(?uri) as ?s) FILTER(STRSTARTS(?s, "http://")) }