Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23833

wrong explain and result when full join with join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.1
    • None
    • Hive
    • None

    Description

      Reproduce:

      1. Create three tables, mytest_t1, mytest_t2, mytest_t4
      2. hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;"
      3. hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code);"
      4. expect output row are over 6000, but actually get 685 rows

      2 - explain

       Map Reduce
      Map Operator Tree:
      TableScan
      alias: t1
      Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: material_code (type: string), wh_guid (type: string)
      outputColumnNames: _col0, _col1
      Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
      Reduce Output Operator
      key expressions: _col0 (type: string)
      sort order: +
      Map-reduce partition columns: _col0 (type: string)
      Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
      value expressions: _col1 (type: string)
      TableScan
      alias: t2
      Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: material_code (type: string), wh_guid (type: string)
      outputColumnNames: _col0, _col1
      Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
      Reduce Output Operator
      key expressions: _col0 (type: string)
      sort order: +
      Map-reduce partition columns: _col0 (type: string)
      Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
      value expressions: _col1 (type: string)
      Reduce Operator Tree:
      Join Operator
      condition map:
      Outer Join 0 to 1
      keys:
      0 _col0 (type: string)
      1 _col0 (type: string)
      outputColumnNames: _col1, _col3
      Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: COALESCE(_col1,_col3) (type: string)
      outputColumnNames: _col0
      Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
      File Output Operator
      compressed: false
      Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
      table:
      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

      3 - explain

      STAGE PLANS:
      Stage: Stage-7
      Map Reduce Local Work
      Alias -> Map Local Tables:
      $hdt$_1:t2
      Fetch Operator
      limit: -1
      $hdt$_2:t5
      Fetch Operator
      limit: -1
      Alias -> Map Local Operator Tree:
      $hdt$_1:t2
      TableScan
      alias: t2
      Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
      Filter Operator
      predicate: material_code is not null (type: boolean)
      Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: material_code (type: string), wh_guid (type: string)
      outputColumnNames: _col0, _col1
      Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
      HashTable Sink Operator
      keys:
      0 _col0 (type: string)
      1 _col0 (type: string)
      $hdt$_2:t5
      TableScan
      alias: t5
      Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
      Filter Operator
      predicate: material_code is not null (type: boolean)
      Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: material_code (type: string)
      outputColumnNames: _col0
      Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
      HashTable Sink Operator
      keys:
      0 COALESCE(_col0,_col2) (type: string)
      1 _col0 (type: string)

      Stage: Stage-5
      Map Reduce
      Map Operator Tree:
      TableScan
      alias: t1
      Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
      Filter Operator
      predicate: material_code is not null (type: boolean)
      Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: material_code (type: string), wh_guid (type: string)
      outputColumnNames: _col0, _col1
      Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
      Map Join Operator
      condition map:
      Inner Join 0 to 1
      keys:
      0 _col0 (type: string)
      1 _col0 (type: string)
      outputColumnNames: _col0, _col1, _col2, _col3
      Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
      Map Join Operator
      condition map:
      Inner Join 0 to 1
      keys:
      0 COALESCE(_col0,_col2) (type: string)
      1 _col0 (type: string)
      outputColumnNames: _col1, _col3
      Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: COALESCE(_col1,_col3) (type: string)
      outputColumnNames: _col0
      Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
      File Output Operator
      compressed: false
      Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
      table:
      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

      Attachments

        Activity

          People

            Unassigned Unassigned
            chuanjie.duan chuanjie.duan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: