Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15117

Partition filters are not pushed down with lateral view and undeterministic UDF

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.1
    • Fix Version/s: 1.2.1
    • Component/s: Hive
    • Labels:
    • Target Version/s:

      Description

      sql with lateral view didn't push down partition column as expected!.

      here is how it can be reproduced.

      1. create test table

      create table test_lateral_view (id bigint,json_cont string) partitioned by (vt string);

      2. explain below sql

      select *
      from test_lateral_view a
      lateral view json_tuple(json_cont, 'iids', 'indexs') b as iids,indexs
      where a.vt = '2016-10-27'
      and rand()>0.5;

      here is my result:

      STAGE DEPENDENCIES:
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-0
      Fetch Operator
      limit: -1
      Processor Tree:
      TableScan
      alias: a
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Lateral View Forward
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Select Operator
      expressions: id (type: bigint), json_cont (type: string), vt (type: string)
      outputColumnNames: id, json_cont, vt
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Lateral View Join Operator
      outputColumnNames: _col0, _col1, _col2, _col6, _col7
      Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Filter Operator

      predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean)

      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Select Operator
      expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string)
      outputColumnNames: _col0, _col1, _col2, _col3, _col4
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      ListSink
      Select Operator
      expressions: json_cont (type: string), 'iids' (type: string), 'indexs' (type: string)
      outputColumnNames: _col0, _col1, _col2
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      UDTF Operator
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      function name: json_tuple
      Lateral View Join Operator
      outputColumnNames: _col0, _col1, _col2, _col6, _col7
      Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Filter Operator
      predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean)
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Select Operator
      expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string)
      outputColumnNames: _col0, _col1, _col2, _col3, _col4
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      ListSink

      As you can see,the partition column is in filter operator,which means this sql will scan the whole table.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              longshangren LongShangRen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: