Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-779

Filter placement should be able to break up extend

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 2.12.0
    • Jena 2.12.1
    • ARQ, Optimizer
    • None

    Description

      The following query demonstrates a query plan seen internally which is considered sub-optimal.

      Consider the following query:

      SELECT DISTINCT ?domainName
      {
        { ?uri ?p ?o }
        UNION
        {
          ?sub ?p ?uri
          FILTER(isIRI(?uri))
        }
        BIND(str(?uri) as ?s)
        FILTER(STRSTARTS(?s, "http://"))
        BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
      }
      

      Which ARQ optimises as follows:

      (distinct
        (project (?domainName)
          (filter (strstarts ?s "http://")
            (extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/")))))
              (union
                (bgp (triple ?uri ?p ?o))
                (filter (isIRI ?uri)
                  (bgp (triple ?sub ?p ?uri))))))))
      

      Which makes the query engine do a lot of work because it computes the both the BIND expressions for lots of possible solutions that will then be rejected when for many of them it would only be necessary to compute the first simple BIND function.

      It would be better if the query was planned as follows:

      (distinct
        (project (?domainName)
          (extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/"))))
            (filter (strstarts ?s "http://")
              (extend (?s (str ?uri))
                (union
                  (bgp (triple ?uri ?p ?o))
                  (filter (isIRI ?uri)
                    (bgp (triple ?sub ?p ?uri)))))))))
      

      Essentially when we try to push a filter through an extend if we determine that we cannot push it through the extend we should see if we can split the extend instead thus resulting in a partial pushing.

      Note that a user can re-write the original query to yield this plan if they make the second BIND a project expression like so:

      SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
      {
        { ?uri ?p ?o }
        UNION
        {
          ?sub ?p ?uri
          FILTER(isIRI(?uri))
        }
        BIND(str(?uri) as ?s)
        FILTER(STRSTARTS(?s, "http://"))
      }
      

      Attachments

        1. JENA-779-filter-extend_distinct.patch
          18 kB
          Andy Seaborne
        2. JENA-779-filter-extend-extend
          2 kB
          Andy Seaborne
        3. JENA-779-single-extend.patch
          2 kB
          Andy Seaborne
        4. JENA-779.patch
          7 kB
          Rob Vesse

        Issue Links

          Activity

            People

              andy Andy Seaborne
              rvesse Rob Vesse
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: