Derby
  1. Derby
  2. DERBY-4405

Transformation to inner join not performed for certain three-way joins

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.6.1.0
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Bug behavior facts:
      Performance

      Description

      In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.

      The queries use the tours db and look like this:

      (1)
      SELECT * FROM CITIES LEFT OUTER JOIN
      (FLIGHTS CROSS JOIN COUNTRIES)
      ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
      WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'

      (2)
      SELECT * FROM CITIES LEFT OUTER JOIN
      FLIGHTS INNER JOIN COUNTRIES ON 1=1
      ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
      WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'

      (3)
      SELECT * FROM CITIES LEFT OUTER JOIN
      (SELECT * FROM FLIGHTS, COUNTRIES) S
      ON CITIES.AIRPORT = S.ORIG_AIRPORT
      WHERE S.COUNTRY_ISO_CODE = 'US'

      When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.

      The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.

      It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.

      This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

      1. derby-4405-2.stat
        0.2 kB
        Dag H. Wanvik
      2. derby-4405-2.diff
        39 kB
        Dag H. Wanvik
      3. derby-4405.stat
        0.4 kB
        Dag H. Wanvik
      4. derby-4405.diff
        38 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Dag H. Wanvik added a comment -

          Uploading a patch which lets the two variant outer join statements be rewritten to
          inner join, by looking not just at the top level inner table for match when analyzing
          the null intolerant predicates, but checking against tables in nested inner join (inner join in the inner position of an outer join) as well.
          Added a repro test case to JoinNode which checks for the presence of a hash join to
          indicate the rewrite did happen (not used before the rewrite).

          Running regressions.

          Show
          Dag H. Wanvik added a comment - Uploading a patch which lets the two variant outer join statements be rewritten to inner join, by looking not just at the top level inner table for match when analyzing the null intolerant predicates, but checking against tables in nested inner join (inner join in the inner position of an outer join) as well. Added a repro test case to JoinNode which checks for the presence of a hash join to indicate the rewrite did happen (not used before the rewrite). Running regressions.
          Hide
          Bryan Pendleton added a comment -

          Hi Dag, you should assign this issue to yourself. Thanks for investigating this!

          Why did you need to add a parameter to the fillInReferencedTableMap() method?
          Would it be possible to just make the new behavior be unconditional, and have
          JoinNode always fill in the table map positions from its children?

          Show
          Bryan Pendleton added a comment - Hi Dag, you should assign this issue to yourself. Thanks for investigating this! Why did you need to add a parameter to the fillInReferencedTableMap() method? Would it be possible to just make the new behavior be unconditional, and have JoinNode always fill in the table map positions from its children?
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at the patch, Bryan. I introduced the boolean so as not the change the behavior of the
          call to fillInReferencedTableMap from FromBaseTable#LOJgetReferencedTables, since that call was not involved
          in solving this issue. I will investigate some more of that usage could benefit from the closure as well. LOJgetReferencedTables is used by the LOJ linearization logic, and I didn't study it yet.

          Show
          Dag H. Wanvik added a comment - Thanks for looking at the patch, Bryan. I introduced the boolean so as not the change the behavior of the call to fillInReferencedTableMap from FromBaseTable#LOJgetReferencedTables, since that call was not involved in solving this issue. I will investigate some more of that usage could benefit from the closure as well. LOJgetReferencedTables is used by the LOJ linearization logic, and I didn't study it yet.
          Hide
          Dag H. Wanvik added a comment -

          Regressions passed.

          Show
          Dag H. Wanvik added a comment - Regressions passed.
          Hide
          Dag H. Wanvik added a comment -

          Trying to understand the code for the outer join reordering a.k.a. linearization that Derby performs, cf.
          HalfOuterJoinNode#LOJ_reorderable.

          Is anyone familiar with what the rationale/intuition for this transformation is? I did not find it
          described in the Tuning guide (http://db.apache.org/derby/docs/10.5/tuning/ctuntransform55045.html).
          It does mention the simplification to inner joins, though.

          This fairly recent paper which does address it [1] in much detail, but Derby's logic is fairly limited..

          [1] Galindo-Legaria, C. & Rosenthal, A.: "Outerjoin simplification and reordering for query optimization", ACM Transactions on Database Systems, Vol 22, No 1, March 1997.

          Show
          Dag H. Wanvik added a comment - Trying to understand the code for the outer join reordering a.k.a. linearization that Derby performs, cf. HalfOuterJoinNode#LOJ_reorderable. Is anyone familiar with what the rationale/intuition for this transformation is? I did not find it described in the Tuning guide ( http://db.apache.org/derby/docs/10.5/tuning/ctuntransform55045.html ). It does mention the simplification to inner joins, though. This fairly recent paper which does address it [1] in much detail, but Derby's logic is fairly limited.. [1] Galindo-Legaria, C. & Rosenthal, A.: "Outerjoin simplification and reordering for query optimization", ACM Transactions on Database Systems, Vol 22, No 1, March 1997.
          Hide
          Dag H. Wanvik added a comment -

          Looking at the reordering done by Derby in
          HalfOuterJoinNode#LOJ_reorderable, it seems to be buggy; filed
          DERBY-4471 for that.

          Instead of modifying fillInReferencedTableMap when used from
          transformOuterJoins, I see that I possibly could use the method
          ResultSetNode#LOJgetReferencedTables instead, which does look inside
          join (inner and outer join) nodes in the way we want. I will make a
          simplified patch which uses that method.

          Btw, the simplication of outer join to inner join is defined by
          identity #3 in [1] referred to above.

          Show
          Dag H. Wanvik added a comment - Looking at the reordering done by Derby in HalfOuterJoinNode#LOJ_reorderable, it seems to be buggy; filed DERBY-4471 for that. Instead of modifying fillInReferencedTableMap when used from transformOuterJoins, I see that I possibly could use the method ResultSetNode#LOJgetReferencedTables instead, which does look inside join (inner and outer join) nodes in the way we want. I will make a simplified patch which uses that method. Btw, the simplication of outer join to inner join is defined by identity #3 in [1] referred to above.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a simplified patch along the lines I suggested, plus added
          test cases for right outer join too, and made more explicit tests that outer join is gone after rewrite (RuntimeStatisticsParser#usedNLLeftOuterJoin).
          Rerunning regressions.

          Show
          Dag H. Wanvik added a comment - Uploading a simplified patch along the lines I suggested, plus added test cases for right outer join too, and made more explicit tests that outer join is gone after rewrite (RuntimeStatisticsParser#usedNLLeftOuterJoin). Rerunning regressions.
          Hide
          Dag H. Wanvik added a comment -

          Regressions passed, committed as svn 891015.

          Show
          Dag H. Wanvik added a comment - Regressions passed, committed as svn 891015.
          Hide
          Knut Anders Hatlen added a comment -

          Verified that the three statements had similar performance on trunk. Closing.

          Show
          Knut Anders Hatlen added a comment - Verified that the three statements had similar performance on trunk. Closing.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development