Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-473

ARQ should be able to optimize implicit joins and implicit left joins

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • Jena 2.11.0
    • ARQ

    Description

      There is a class of useful optimizations that currently ARQ does not even attempt to apply which are usually referred to as implicit joins.

      A trivial example is as follows:

      SELECT *
      WHERE
      {
      ?x ?p1 ?o1 .
      ?y ?p2 ?o2 .
      FILTER(?x = ?y)
      }

      Currently this requires us to compute a cross product and then apply the filter, even with streaming evaluation this can be extremely costly. The aim of this optimization is to produce a query like the following:

      SELECT *
      WHERE
      {
      ?x ?p1 ?o1 .
      ?x ?p2 ?o2 .
      BIND(?x AS ?y)
      }

      This optimization can also be applied to some left joins where the implicit join applies across the join e.g.

      SELECT *
      WHERE
      {
      ?x ?p1 ?o1 .
      OPTIONAL

      { ?y ?p2 ?o2 . FILTER(?x = ?y) }

      }

      This can be thought of as a generalization of TransformFilterEquality except covering the case where both items are variables. Since both things are variables we need to be careful about when we apply this optimization since when = is used we need to guarantee that substituting one variable for the other does not alter the semantics of the query.

      I believe the optimization is safe to apply providing that we can guarantee (as far as possible) that one variable is non-literal. This can be done by inspecting the positions in which the mentioned variables are used and ensuring that at least one of the variables occurs in the graph, subject or predicate position.

      Safety for left joins is a little more complex since we must ensure that at least one of the variables occurs in the RHS and we can only make the substitution in the RHS as otherwise we change the join semantics.

      Attachments

        1. impl-join.csv
          2 kB
          Rob Vesse
        2. impl-join-opt.csv
          2 kB
          Rob Vesse
        3. impl-join-opt-linearized.csv
          2 kB
          Rob Vesse

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rvesse Rob Vesse
            rvesse Rob Vesse
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment