Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12465

Hive might produce wrong results when (outer) joins are merged

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 2.0.0
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Consider the following query:

      select * from
        (select * from tab where tab.key = 0)a
      full outer join
        (select * from tab_part where tab_part.key = 98)b
      join
        tab_part c
      on a.key = b.key and b.key = c.key;
      

      Hive should execute the full outer join operation (without ON clause) and then the join operation (ON a.key = b.key and b.key = c.key). Instead, it merges both joins, generating the following plan:

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: tab
                  filterExpr: (key = 0) (type: boolean)
                  Statistics: Num rows: 242 Data size: 22748 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (key = 0) (type: boolean)
                    Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: 0 (type: int), value (type: string), ds (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: string), _col2 (type: string)
                TableScan
                  alias: tab_part
                  filterExpr: (key = 98) (type: boolean)
                  Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (key = 98) (type: boolean)
                    Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: 98 (type: int), value (type: string), ds (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: string), _col2 (type: string)
                TableScan
                  alias: c
                  Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: key (type: int)
                    sort order: +
                    Map-reduce partition columns: key (type: int)
                    Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                    value expressions: value (type: string), ds (type: string)
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Outer Join 0 to 1
                     Inner Join 1 to 2
                keys:
                  0 _col0 (type: int)
                  1 _col0 (type: int)
                  2 key (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      That plan is equivalent to the following query, which is different than the original one:

      select * from
        (select * from tab where tab.key = 0)a
      full outer join
        (select * from tab_part where tab_part.key = 98)b
      on a.key = b.key
      join
        tab_part c
      on b.key = c.key;
      

      It seems to be a problem in the recognition of join operations that can be merged into a single multijoin operator.

        Attachments

        1. HIVE-12465.patch
          2 kB
          Jesus Camacho Rodriguez
        2. HIVE-12465.01.patch
          30 kB
          Jesus Camacho Rodriguez
        3. HIVE-12465.02.patch
          59 kB
          Jesus Camacho Rodriguez
        4. HIVE-12465-branch1.patch
          3 kB
          Jesus Camacho Rodriguez

          Issue Links

            Activity

              People

              • Assignee:
                jcamachorodriguez Jesus Camacho Rodriguez
                Reporter:
                jcamachorodriguez Jesus Camacho Rodriguez
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: