Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22363

ReduceDeduplication may leave an invalid GroupByOperator behind in some cases

    XMLWordPrintableJSON

Details

    Description

      since HIVE-11387 reducededup may traverse GroupByOperators as well

      But the removal logic only removes the first parent; so if there is some other operator (a FIL in this case) between the sink and the gby - the removal may not happen here

      set hive.cbo.enable=false;
      
      drop table if exists xl1;
      create table xl1 as
      select '1' as mdl_yr_desc, 2 as seq_no,'3' as opt_desc1,4 as opt_desc,1 as row_num;
      
      explain
      select trim(base.mdl_yr_desc) mdl_yr_desc, trim(base.opt_desc) opt_desc
      from
      (
          SELECT trim(mdl_yr_desc) mdl_yr_desc, concat_ws(' ', collect_set(trim(opt_desc1))) AS opt_desc
          from
          (
              select t14304.* 
              from
              (
                  select * from xl1
              ) t14304  
              where row_num = 1
              order by trim(mdl_yr_desc), cast(seq_no as int) asc
          ) x
          group by trim(mdl_yr_desc)
      ) base
      inner join
          (
              select 1 as v
          ) dedup
          on  trim(base.mdl_yr_desc) != dedup.v
      group by trim(base.mdl_yr_desc), trim(base.opt_desc) ;
      

      Attachments

        1. HIVE-22363.04.patch
          2 kB
          Zoltan Haindrich
        2. HIVE-22363.04.patch
          2 kB
          Zoltan Haindrich
        3. HIVE-22363.03.patch
          58 kB
          Zoltan Haindrich
        4. HIVE-22363.02.patch
          32 kB
          Zoltan Haindrich
        5. HIVE-22363.01.patch
          6 kB
          Zoltan Haindrich

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m