Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1342

Predicate push down get error result when sub-queries have the same alias name

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.6.0
    • 0.8.0
    • Query Processor
    • None
    • ppd

    Description

      Query is over-optimized by PPD when sub-queries have the same alias name, see the query:

      -------------------------------
      create table if not exists dm_fact_buyer_prd_info_d (
      category_id string
      ,gmv_trade_num int
      ,user_id int
      )
      PARTITIONED BY (ds int);

      set hive.optimize.ppd=true;
      set hive.map.aggr=true;

      explain select category_id1,category_id2,assoc_idx
      from (
      select
      category_id1
      , category_id2
      , count(distinct user_id) as assoc_idx
      from (
      select
      t1.category_id as category_id1
      , t2.category_id as category_id2
      , t1.user_id
      from (
      select category_id, user_id
      from dm_fact_buyer_prd_info_d
      group by category_id, user_id ) t1
      join (
      select category_id, user_id
      from dm_fact_buyer_prd_info_d
      group by category_id, user_id ) t2 on t1.user_id=t2.user_id
      ) t1
      group by category_id1, category_id2 ) t_o
      where category_id1 <> category_id2
      and assoc_idx > 2;

      -----------------------------
      The query above will fail when execute, throwing exception: "can not cast UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)".

      I explained the query and the execute plan looks really wired ( only Stage-1, see the highlighted predicate):

      -------------------------------
      Stage: Stage-1
      Map Reduce
      Alias -> Map Operator Tree:
      t_o:t1:t1:dm_fact_buyer_prd_info_d
      TableScan
      alias: dm_fact_buyer_prd_info_d
      Filter Operator
      predicate:
      expr: (category_id <> user_id)
      type: boolean
      Select Operator
      expressions:
      expr: category_id
      type: string
      expr: user_id
      type: bigint
      outputColumnNames: category_id, user_id
      Group By Operator
      keys:
      expr: category_id
      type: string
      expr: user_id
      type: bigint
      mode: hash
      outputColumnNames: _col0, _col1
      Reduce Output Operator
      key expressions:
      expr: _col0
      type: string
      expr: _col1
      type: bigint
      sort order: ++
      Map-reduce partition columns:
      expr: _col0
      type: string
      expr: _col1
      type: bigint
      tag: -1
      Reduce Operator Tree:
      Group By Operator
      keys:
      expr: KEY._col0
      type: string
      expr: KEY._col1
      type: bigint
      mode: mergepartial
      outputColumnNames: _col0, _col1
      Select Operator
      expressions:
      expr: _col0
      type: string
      expr: _col1
      type: bigint
      outputColumnNames: _col0, _col1
      File Output Operator
      compressed: true
      GlobalTableId: 0
      table:
      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      ----------------------------------

      If disabling predicate push down (set hive.optimize.ppd=true), the error is gone; I tried disabling map side aggregate, the error is gone,too.

      Changing the alias of subquery 't1' (either the inner one or the join result), the bug disappears, too.

      Attachments

        1. cmd.hql
          0.8 kB
          Ted Xu
        2. explain
          11 kB
          Ted Xu
        3. ppd_same_alias_1.patch
          14 kB
          Ted Xu
        4. ppd_same_alias_2.patch
          17 kB
          Ted Xu
        5. HIVE-1342v1.patch
          15 kB
          Charles Chen
        6. HIVE-1342v2.patch
          15 kB
          Charles Chen
        7. HIVE-1342v3.patch
          15 kB
          Charles Chen
        8. HIVE-1342v4.patch
          30 kB
          Charles Chen

        Issue Links

          Activity

            People

              ccy Charles Chen
              tedxu Ted Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: