Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16421

Runtime filtering breaks user-level explain



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 3.0.0
    • None
    • None



      SELECT LAG(COALESCE(t2.int_col_14, t1.int_col_80),22) OVER (ORDER BY t1.tinyint_col_52 DESC) AS int_col FROM table_6 t1 INNER JOIN table_14 t2 ON ((t2.decimal0101_col_55) = (t1.decimal0101_col_9));

      Without runtime filtering

      |                                                                                                           Explain                                                                                                           |
      | Plan not optimized by CBO.                                                                                                                                                                                                  |
      |                                                                                                                                                                                                                             |
      | Vertex dependency in root stage                                                                                                                                                                                             |
      | Map 1 <- Map 3 (BROADCAST_EDGE)                                                                                                                                                                                             |
      | Reducer 2 <- Map 1 (SIMPLE_EDGE)                                                                                                                                                                                            |
      |                                                                                                                                                                                                                             |
      | Stage-0                                                                                                                                                                                                                     |
      |    Fetch Operator                                                                                                                                                                                                           |
      |       limit:-1                                                                                                                                                                                                              |
      |       Stage-1                                                                                                                                                                                                               |
      |          Reducer 2                                                                                                                                                                                                          |
      |          File Output Operator [FS_364]                                                                                                                                                                                      |
      |             compressed:false                                                                                                                                                                                                |
      |             Statistics:Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                 |
      |             table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}  |
      |             Select Operator [SEL_362]                                                                                                                                                                                       |
      |                outputColumnNames:["_col0"]                                                                                                                                                                                  |
      |                Statistics:Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                              |
      |                PTF Operator [PTF_361]                                                                                                                                                                                       |
      |                   Function definitions:[{"Input definition":{"type:":"WINDOWING"}},{"order by:":"_col51(DESC)","name:":"windowingtablefunction","partition by:":"0"}]                                                       |
      |                   Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                           |
      |                   Select Operator [SEL_360]                                                                                                                                                                                 |
      |                   |  outputColumnNames:["_col51","_col79","_col97"]                                                                                                                                                         |
      |                   |  Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                        |
      |                   |<-Map 1 [SIMPLE_EDGE] vectorized                                                                                                                                                                         |
      |                      Reduce Output Operator [RS_375]                                                                                                                                                                        |
      |                         key expressions:0 (type: int), _col51 (type: tinyint)                                                                                                                                               |
      |                         Map-reduce partition columns:0 (type: int)                                                                                                                                                          |
      |                         sort order:+-                                                                                                                                                                                       |
      |                         Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                     |
      |                         value expressions:_col79 (type: int), _col97 (type: int)                                                                                                                                            |
      |                         Map Join Operator [MAPJOIN_374]                                                                                                                                                                     |
      |                         |  condition map:[{"":"Inner Join 0 to 1"}]                                                                                                                                                         |
      |                         |  HybridGraceHashJoin:true                                                                                                                                                                         |
      |                         |  keys:{"Map 3":"decimal0101_col_55 (type: decimal(1,1))","Map 1":"decimal0101_col_9 (type: decimal(1,1))"}                                                                                        |
      |                         |  outputColumnNames:["_col51","_col79","_col97"]                                                                                                                                                   |
      |                         |  Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                  |
      |                         |<-Map 3 [BROADCAST_EDGE] vectorized                                                                                                                                                                |
      |                         |  Reduce Output Operator [RS_372]                                                                                                                                                                  |
      |                         |     key expressions:decimal0101_col_55 (type: decimal(1,1))                                                                                                                                       |
      |                         |     Map-reduce partition columns:decimal0101_col_55 (type: decimal(1,1))                                                                                                                          |
      |                         |     sort order:+                                                                                                                                                                                  |
      |                         |     Statistics:Num rows: 26256 Data size: 2749496 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                    |
      |                         |     value expressions:int_col_14 (type: int)                                                                                                                                                      |
      |                         |     Filter Operator [FIL_371]                                                                                                                                                                     |
      |                         |        predicate:decimal0101_col_55 is not null (type: boolean)                                                                                                                                   |
      |                         |        Statistics:Num rows: 26256 Data size: 2749496 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                 |
      |                         |        TableScan [TS_353]                                                                                                                                                                         |
      |                         |           alias:t2                                                                                                                                                                                |
      |                         |           Statistics:Num rows: 29079 Data size: 117014275 Basic stats: COMPLETE Column stats: COMPLETE                                                                                            |
      |                         |<-Filter Operator [FIL_373]                                                                                                                                                                        |
      |                               predicate:decimal0101_col_9 is not null (type: boolean)                                                                                                                                       |
      |                               Statistics:Num rows: 48419 Data size: 5233788 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                    |
      |                               TableScan [TS_352]                                                                                                                                                                            |
      |                                  alias:t1                                                                                                                                                                                   |
      |                                  Statistics:Num rows: 53742 Data size: 200230374 Basic stats: COMPLETE Column stats: COMPLETE                                                                                               |
      |                                                                                                                                                                                                                             |

      With runtime filtering:

      |                                                                                                                                                 Explain                                                                                                                                                  |
      | STAGE DEPENDENCIES:                                                                                                                                                                                                                                                                                      |
      |   Stage-1 is a root stage                                                                                                                                                                                                                                                                                |
      |   Stage-0 depends on stages: Stage-1                                                                                                                                                                                                                                                                     |
      |                                                                                                                                                                                                                                                                                                          |
      | STAGE PLANS:                                                                                                                                                                                                                                                                                             |
      |   Stage: Stage-1                                                                                                                                                                                                                                                                                         |
      |     Tez                                                                                                                                                                                                                                                                                                  |
      |       DagId: hive_20170411232247_e177745a-39d0-4ae7-8ca0-871a137b36fa:1                                                                                                                                                                                                                                  |
      |       Edges:                                                                                                                                                                                                                                                                                             |
      |         Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)                                                                                                                                                                                                                                      |
      |         Reducer 2 <- Map 1 (SIMPLE_EDGE)                                                                                                                                                                                                                                                                 |
      |         Reducer 4 <- Map 3 (SIMPLE_EDGE)                                                                                                                                                                                                                                                                 |
      |       DagName:                                                                                                                                                                                                                                                                                           |
      |       Vertices:                                                                                                                                                                                                                                                                                          |
      |         Map 1                                                                                                                                                                                                                                                                                            |
      |             Map Operator Tree:                                                                                                                                                                                                                                                                           |
      |                 TableScan                                                                                                                                                                                                                                                                                |
      |                   alias: t1                                                                                                                                                                                                                                                                              |
      |                   filterExpr: (decimal0101_col_9 is not null and (decimal0101_col_9 BETWEEN DynamicValue(RS_7_t2_decimal0101_col_9_min) AND DynamicValue(RS_7_t2_decimal0101_col_9_max) and in_bloom_filter(decimal0101_col_9, DynamicValue(RS_7_t2_decimal0101_col_9_bloom_filter)))) (type: boolean)   |
      |                   Statistics: Num rows: 53742 Data size: 5809320 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                            |
      |                   Filter Operator                                                                                                                                                                                                                                                                        |
      |                     predicate: (decimal0101_col_9 is not null and (decimal0101_col_9 BETWEEN DynamicValue(RS_7_t2_decimal0101_col_9_min) AND DynamicValue(RS_7_t2_decimal0101_col_9_max) and in_bloom_filter(decimal0101_col_9, DynamicValue(RS_7_t2_decimal0101_col_9_bloom_filter)))) (type: boolean)  |
      |                     Statistics: Num rows: 48419 Data size: 5233908 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                          |
      |                     Select Operator                                                                                                                                                                                                                                                                      |
      |                       expressions: decimal0101_col_9 (type: decimal(1,1)), tinyint_col_52 (type: tinyint), int_col_80 (type: int)                                                                                                                                                                        |
      |                       outputColumnNames: _col0, _col1, _col2                                                                                                                                                                                                                                             |
      |                       Statistics: Num rows: 48419 Data size: 5233908 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                        |
      |                       Map Join Operator                                                                                                                                                                                                                                                                  |
      |                         condition map:                                                                                                                                                                                                                                                                   |
      |                              Inner Join 0 to 1                                                                                                                                                                                                                                                           |
      |                         keys:                                                                                                                                                                                                                                                                            |
      |                           0 _col0 (type: decimal(1,1))                                                                                                                                                                                                                                                   |
      |                           1 _col1 (type: decimal(1,1))                                                                                                                                                                                                                                                   |
      |                         outputColumnNames: _col1, _col2, _col3                                                                                                                                                                                                                                           |
      |                         input vertices:                                                                                                                                                                                                                                                                  |
      |                           1 Map 3                                                                                                                                                                                                                                                                        |
      |                         Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                 |
      |                         Reduce Output Operator                                                                                                                                                                                                                                                           |
      |                           key expressions: 0 (type: int), _col1 (type: tinyint)                                                                                                                                                                                                                          |
      |                           sort order: +-                                                                                                                                                                                                                                                                 |
      |                           Map-reduce partition columns: 0 (type: int)                                                                                                                                                                                                                                    |
      |                           Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                               |
      |                           value expressions: _col2 (type: int), _col3 (type: int)                                                                                                                                                                                                                        |
      |             Execution mode: vectorized, llap                                                                                                                                                                                                                                                             |
      |         Map 3                                                                                                                                                                                                                                                                                            |
      |             Map Operator Tree:                                                                                                                                                                                                                                                                           |
      |                 TableScan                                                                                                                                                                                                                                                                                |
      |                   alias: t2                                                                                                                                                                                                                                                                              |
      |                   filterExpr: decimal0101_col_55 is not null (type: boolean)                                                                                                                                                                                                                             |
      |                   Statistics: Num rows: 29079 Data size: 3045240 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                            |
      |                   Filter Operator                                                                                                                                                                                                                                                                        |
      |                     predicate: decimal0101_col_55 is not null (type: boolean)                                                                                                                                                                                                                            |
      |                     Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                          |
      |                     Select Operator                                                                                                                                                                                                                                                                      |
      |                       expressions: int_col_14 (type: int), decimal0101_col_55 (type: decimal(1,1))                                                                                                                                                                                                       |
      |                       outputColumnNames: _col0, _col1                                                                                                                                                                                                                                                    |
      |                       Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                        |
      |                       Reduce Output Operator                                                                                                                                                                                                                                                             |
      |                         key expressions: _col1 (type: decimal(1,1))                                                                                                                                                                                                                                      |
      |                         sort order: +                                                                                                                                                                                                                                                                    |
      |                         Map-reduce partition columns: _col1 (type: decimal(1,1))                                                                                                                                                                                                                         |
      |                         Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                      |
      |                         value expressions: _col0 (type: int)                                                                                                                                                                                                                                             |
      |                       Select Operator                                                                                                                                                                                                                                                                    |
      |                         expressions: _col1 (type: decimal(1,1))                                                                                                                                                                                                                                          |
      |                         outputColumnNames: _col0                                                                                                                                                                                                                                                         |
      |                         Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                      |
      |                         Group By Operator                                                                                                                                                                                                                                                                |
      |                           aggregations: min(_col0), max(_col0), bloom_filter(_col0, expectedEntries=17)                                                                                                                                                                                                  |
      |                           mode: hash                                                                                                                                                                                                                                                                     |
      |                           outputColumnNames: _col0, _col1, _col2                                                                                                                                                                                                                                         |
      |                           Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                            |
      |                           Reduce Output Operator                                                                                                                                                                                                                                                         |
      |                             sort order:                                                                                                                                                                                                                                                                  |
      |                             Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                          |
      |                             value expressions: _col0 (type: decimal(1,1)), _col1 (type: decimal(1,1)), _col2 (type: binary)                                                                                                                                                                              |
      |             Execution mode: vectorized, llap                                                                                                                                                                                                                                                             |
      |         Reducer 2                                                                                                                                                                                                                                                                                        |
      |             Execution mode: llap                                                                                                                                                                                                                                                                         |
      |             Reduce Operator Tree:                                                                                                                                                                                                                                                                        |
      |               Select Operator                                                                                                                                                                                                                                                                            |
      |                 expressions: KEY.reducesinkkey1 (type: tinyint), VALUE._col1 (type: int), VALUE._col2 (type: int)                                                                                                                                                                                        |
      |                 outputColumnNames: _col1, _col2, _col3                                                                                                                                                                                                                                                   |
      |                 Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                         |
      |                 PTF Operator                                                                                                                                                                                                                                                                             |
      |                   Function definitions:                                                                                                                                                                                                                                                                  |
      |                       Input definition                                                                                                                                                                                                                                                                   |
      |                         input alias: ptf_0                                                                                                                                                                                                                                                               |
      |                         output shape: _col1: tinyint, _col2: int, _col3: int                                                                                                                                                                                                                             |
      |                         type: WINDOWING                                                                                                                                                                                                                                                                  |
      |                       Windowing table definition                                                                                                                                                                                                                                                         |
      |                         input alias: ptf_1                                                                                                                                                                                                                                                               |
      |                         name: windowingtablefunction                                                                                                                                                                                                                                                     |
      |                         order by: _col1 DESC NULLS LAST                                                                                                                                                                                                                                                  |
      |                         partition by: 0                                                                                                                                                                                                                                                                  |
      |                         raw input shape:                                                                                                                                                                                                                                                                 |
      |                         window functions:                                                                                                                                                                                                                                                                |
      |                             window function definition                                                                                                                                                                                                                                                   |
      |                               alias: LAG_window_0                                                                                                                                                                                                                                                        |
      |                               arguments: COALESCE(_col3,_col2), 22                                                                                                                                                                                                                                       |
      |                                                                                                                                                 Explain                                                                                                                                                  |
      |                               name: LAG                                                                                                                                                                                                                                                                  |
      |                               window function: GenericUDAFLagEvaluator                                                                                                                                                                                                                                   |
      |                               window frame: PRECEDING(MAX)~FOLLOWING(MAX)                                                                                                                                                                                                                                |
      |                               isPivotResult: true                                                                                                                                                                                                                                                        |
      |                   Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                       |
      |                   Select Operator                                                                                                                                                                                                                                                                        |
      |                     expressions: LAG_window_0 (type: int)                                                                                                                                                                                                                                                |
      |                     outputColumnNames: _col0                                                                                                                                                                                                                                                             |
      |                     Statistics: Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                     |
      |                     File Output Operator                                                                                                                                                                                                                                                                 |
      |                       compressed: false                                                                                                                                                                                                                                                                  |
      |                       Statistics: Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                   |
      |                       table:                                                                                                                                                                                                                                                                             |
      |                           input format: org.apache.hadoop.mapred.SequenceFileInputFormat                                                                                                                                                                                                                 |
      |                           output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                                                                                                                                                                                       |
      |                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                                                                                                                                                                                                      |
      |         Reducer 4                                                                                                                                                                                                                                                                                        |
      |             Execution mode: vectorized, llap                                                                                                                                                                                                                                                             |
      |             Reduce Operator Tree:                                                                                                                                                                                                                                                                        |
      |               Group By Operator                                                                                                                                                                                                                                                                          |
      |                 aggregations: min(VALUE._col0), max(VALUE._col1), bloom_filter(VALUE._col2, expectedEntries=17)                                                                                                                                                                                          |
      |                 mode: final                                                                                                                                                                                                                                                                              |
      |                 outputColumnNames: _col0, _col1, _col2                                                                                                                                                                                                                                                   |
      |                 Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                                      |
      |                 Reduce Output Operator                                                                                                                                                                                                                                                                   |
      |                   sort order:                                                                                                                                                                                                                                                                            |
      |                   Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                                    |
      |                   value expressions: _col0 (type: decimal(1,1)), _col1 (type: decimal(1,1)), _col2 (type: binary)                                                                                                                                                                                        |
      |                                                                                                                                                                                                                                                                                                          |
      |   Stage: Stage-0                                                                                                                                                                                                                                                                                         |
      |     Fetch Operator                                                                                                                                                                                                                                                                                       |
      |       limit: -1                                                                                                                                                                                                                                                                                          |
      |       Processor Tree:                                                                                                                                                                                                                                                                                    |
      |         ListSink                                                                                                                                                                                                                                                                                         |
      |                                                                                                                                                                                                                                                                                                          |
      135 rows selected (2.348 seconds)


        1. HIVE-16421.01.patch
          97 kB
          Pengcheng Xiong
        2. HIVE-16421.02.patch
          187 kB
          Pengcheng Xiong



            pxiong Pengcheng Xiong
            pxiong Pengcheng Xiong
            0 Vote for this issue
            2 Start watching this issue

