Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22735

TopNKey operator deduplication

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      In some cases more than one TNK operator has the same expressionsĀ in the same operator tree or the difference is only a constant column. Most of this cases only one TNK op. should remain.

      +----------------------------------------------------+
      |                      Explain                       |
      +----------------------------------------------------+
      | Plan not optimized by CBO.                         |
      |                                                    |
      | Vertex dependency in root stage                    |
      | Map 1 <- Reducer 8 (BROADCAST_EDGE)                |
      | Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) |
      | Reducer 3 <- Reducer 2 (SIMPLE_EDGE)               |
      | Reducer 4 <- Reducer 3 (SIMPLE_EDGE)               |
      | Reducer 8 <- Map 7 (CUSTOM_SIMPLE_EDGE)            |
      |                                                    |
      | Stage-0                                            |
      |   Fetch Operator                                   |
      |     limit:50                                       |
      |     Stage-1                                        |
      |       Reducer 4 vectorized                         |
      |       File Output Operator [FS_127]                |
      |         Limit [LIM_126] (rows=50 width=538)        |
      |           Number of rows:50                        |
      |           Select Operator [SEL_125] (rows=190 width=538) |
      |             Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] |
      |           <-Reducer 3 [SIMPLE_EDGE]                |
      |             SHUFFLE [RS_30]                        |
      |               Select Operator [SEL_29] (rows=190 width=538) |
      |                 Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] |
      |                 Group By Operator [GBY_28] (rows=190 width=538) |
      |                   Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"],aggregations:["avg(VALUE._col0)","avg(VALUE._col1)","avg(VALUE._col2)","avg(VALUE._col3)"],keys:KEY._col0, KEY._col1, KEY._col2 |
      |                 <-Reducer 2 [SIMPLE_EDGE]          |
      |                   SHUFFLE [RS_27]                  |
      |                     PartitionCols:_col0, _col1, _col2 |
      |                     Group By Operator [GBY_26] (rows=190 width=1134) |
      |                       Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"],aggregations:["avg(_col9)","avg(_col11)","avg(_col18)","avg(_col12)"],keys:_col102, _col93, 0L |
      |                       Top N Key Operator [TNK_60] (rows=127 width=234) |
      |                         keys:_col102, _col93, 0L,top n:50 |
      |                         Select Operator [SEL_25] (rows=127 width=234) |
      |                           Output:["_col9","_col11","_col12","_col18","_col93","_col102"] |
      |                           Top N Key Operator [TNK_58] (rows=127 width=234) |
      |                             keys:_col102, _col93,top n:50 |
      |                             Filter Operator [FIL_49] (rows=127 width=234) |
      |                               predicate:((_col22 = _col38) and (_col1 = _col101) and (_col6 = _col69) and (_col3 = _col26)) |
      |                               Map Join Operator [MAPJOIN_102] (rows=2044 width=232) |
      |                                 Conds:MAPJOIN_101._col1=RS_123.i_item_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38","_col69","_col93","_col101","_col102"] |
      |                               <-Map 9 [BROADCAST_EDGE] vectorized |
      |                                 BROADCAST [RS_123] |
      |                                   PartitionCols:i_item_sk |
      |                                   Filter Operator [FIL_122] (rows=204000 width=108) |
      |                                     predicate:i_item_sk is not null |
      |                                     TableScan [TS_4] (rows=204000 width=108) |
      |                                       tpcds_bin_partitioned_orc_100@item,item, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_item_id"] |
      |                               <-Map Join Operator [MAPJOIN_101] (rows=2010 width=118) |
      |                                   Conds:MAPJOIN_100._col6=RS_107.s_store_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38","_col69","_col93"] |
      |                                 <-Map 7 [BROADCAST_EDGE] vectorized |
      |                                   PARTITION_ONLY_SHUFFLE [RS_107] |
      |                                     PartitionCols:s_store_sk |
      |                                     Filter Operator [FIL_106] (rows=402 width=94) |
      |                                       predicate:s_store_sk is not null |
      |                                       TableScan [TS_3] (rows=402 width=94) |
      |                                         tpcds_bin_partitioned_orc_100@store,store, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["s_store_sk","s_state"] |
      |                                 <-Map Join Operator [MAPJOIN_100] (rows=9604000 width=24) |
      |                                     Conds:MERGEJOIN_99._col22=RS_118.d_date_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38"] |
      |                                   <-Map 6 [BROADCAST_EDGE] vectorized |
      |                                     BROADCAST [RS_118] |
      |                                       PartitionCols:d_date_sk |
      |                                       Filter Operator [FIL_117] (rows=73049 width=8) |
      |                                         predicate:d_date_sk is not null |
      |                                         TableScan [TS_2] (rows=73049 width=8) |
      |                                           tpcds_bin_partitioned_orc_100@date_dim,date_dim, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk"] |
      |                                     Dynamic Partitioning Event Operator [EVENT_121] (rows=1 width=8) |
      |                                       Group By Operator [GBY_120] (rows=1 width=8) |
      |                                         Output:["_col0"],keys:_col0 |
      |                                         Select Operator [SEL_119] (rows=73049 width=8) |
      |                                           Output:["_col0"] |
      |                                            Please refer to the previous Filter Operator [FIL_117] |
      |                                   <-Merge Join Operator [MERGEJOIN_99] (rows=9604000 width=16) |
      |                                       Conds:RS_114.ss_cdemo_sk=RS_116.cd_demo_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26"] |
      |                                     <-Map 1 [SIMPLE_EDGE] vectorized |
      |                                       SHUFFLE [RS_114] |
      |                                         PartitionCols:ss_cdemo_sk |
      |                                         Filter Operator [FIL_113] (rows=235814137 width=353) |
      |                                           predicate:(ss_cdemo_sk is not null and ss_store_sk is not null and ss_item_sk is not null and ss_store_sk BETWEEN DynamicValue(RS_17_store_s_store_sk_min) AND DynamicValue(RS_17_store_s_store_sk_max) and in_bloom_filter(ss_store_sk, DynamicValue(RS_17_store_s_store_sk_bloom_filter))) |
      |                                           TableScan [TS_0] (rows=275041999 width=723) |
      |                                             tpcds_bin_partitioned_orc_100@store_sales,store_sales, ACID table,Tbl:COMPLETE,Col:PARTIAL,Output:["ss_item_sk","ss_cdemo_sk","ss_store_sk","ss_quantity","ss_list_price","ss_sales_price","ss_coupon_amt"] |
      |                                           <-Reducer 8 [BROADCAST_EDGE] vectorized |
      |                                             BROADCAST [RS_112] |
      |                                               Group By Operator [GBY_111] (rows=1 width=24) |
      |                                                 Output:["_col0","_col1","_col2"],aggregations:["min(VALUE._col0)","max(VALUE._col1)","bloom_filter(VALUE._col2, expectedEntries=1000000)"] |
      |                                     <-Map 5 [SIMPLE_EDGE] vectorized |
      |                                       SHUFFLE [RS_116] |
      |                                         PartitionCols:cd_demo_sk |
      |                                         Filter Operator [FIL_115] (rows=1920800 width=8) |
      |                                           predicate:cd_demo_sk is not null |
      |                                           TableScan [TS_1] (rows=1920800 width=8) |
      |                                             tpcds_bin_partitioned_orc_100@customer_demographics,customer_demographics, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["cd_demo_sk"] |
      |                                                    |
      +----------------------------------------------------+
      

      Attachments

        1. HIVE-22735.2.patch
          117 kB
          Krisztian Kasa
        2. HIVE-22735.1.patch
          46 kB
          Krisztian Kasa

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kkasa Krisztian Kasa Assign to me
            kkasa Krisztian Kasa
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment