Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3125

Incorrect assignment of outer join On-clause that only references one side of the join.

    Details

      Description

      Impala may return incorrect results for queries that have an outer join with an On-clause predicate that references at least two tables, but not the right-hand side of the join.

      Example query to repro and its plan:

      select a.id aid, b.id bid, a.int_col aint, b.int_col bint
      from functional.alltypes a
      inner join functional.alltypes b
        on a.int_col = b.int_col
      left outer join functional.alltypes c
        on a.id = b.id and b.bigint_col = c.bigint_col
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=320.08MB VCores=3 |
      |                                                           |
      | 08:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 04:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                 |
      | |  hash predicates: b.bigint_col = c.bigint_col         <---  b.id = a.id should be here and not in join below|
      | |                                                         |
      | |--07:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypes c]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 03:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: b.int_col = a.int_col, b.id = a.id    |
      | |  runtime filters: RF000 <- a.int_col, RF001 <- a.id     |
      | |                                                         |
      | |--06:EXCHANGE [HASH(a.int_col,a.id)]                     |
      | |  |                                                      |
      | |  00:SCAN HDFS [functional.alltypes a]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 05:EXCHANGE [HASH(b.int_col,b.id)]                        |
      | |                                                         |
      | 01:SCAN HDFS [functional.alltypes b]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF000 -> b.int_col, RF001 -> b.id     |
      +-----------------------------------------------------------+
      

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        Not tracking this issue as a blocker because of the low estimated likelihood. Also, the fix would have the potential to regress other more common queries.

        Show
        alex.behm Alexander Behm added a comment - Not tracking this issue as a blocker because of the low estimated likelihood. Also, the fix would have the potential to regress other more common queries.
        Hide
        jrussell John Russell added a comment -

        Took the issue / repro / workaround basically verbatim for known issues.

        Blanked out "doc text" field so this issue doesn't show up on my to-do list.

        Show
        jrussell John Russell added a comment - Took the issue / repro / workaround basically verbatim for known issues. Blanked out "doc text" field so this issue doesn't show up on my to-do list.
        Hide
        alex.behm Alexander Behm added a comment -

        commit 12cc5081783e435bbd2e577e8f7666c1ebe7d28a
        Author: Alex Behm <alex.behm@cloudera.com>
        Date: Mon Nov 7 17:32:57 2016 -0800

        IMPALA-3125: Fix assignment of equality predicates from an outer-join On-clause.

        Impala used to incorrectly assign On-clause equality predicates from an
        outer join if those predicates referenced multiple tables, but only one
        side of the outer join.

        The fix is to add an additional check in Analyzer.getEqJoinConjuncts()
        to prevent that incorrect assignment.

        Change-Id: I719e0eeacccad070b1f9509d80aaf761b572add0
        Reviewed-on: http://gerrit.cloudera.org:8080/4986
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit 12cc5081783e435bbd2e577e8f7666c1ebe7d28a Author: Alex Behm <alex.behm@cloudera.com> Date: Mon Nov 7 17:32:57 2016 -0800 IMPALA-3125 : Fix assignment of equality predicates from an outer-join On-clause. Impala used to incorrectly assign On-clause equality predicates from an outer join if those predicates referenced multiple tables, but only one side of the outer join. The fix is to add an additional check in Analyzer.getEqJoinConjuncts() to prevent that incorrect assignment. Change-Id: I719e0eeacccad070b1f9509d80aaf761b572add0 Reviewed-on: http://gerrit.cloudera.org:8080/4986 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            alex.behm Alexander Behm
            Reporter:
            alex.behm Alexander Behm
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development