Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9630

partition filter using date_sub result produces non-optimal plans

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.14.0
    • Fix Version/s: None
    • Component/s: CLI
    • Labels:
      None
    • Environment:

      CentOS 6.5, HDP 2.2

      Description

      We queries which execute many times faster when comparing a date partition value to a date string than when comparing to the results of a date_sub (specifically: date_sub(to_date(from_unixtime(unix_timestamp())), 7) – Looking for seven days earlier).

      For example: The following query requires a MR job to be spun-up and requires 1917 seconds to tell me there are roughly 428 MM rows:

      select count from event_histories eh where eh.adjusted_activity_date_utc > date_sub(to_date(from_unixtime(unix_timestamp())), 7);

      Replacing the "date_sub(to_date(from_unixtime(unix_timestamp())), 7)" with the literal value '2015-02-02' (the value for today) results in no MR job and the result of 428 MM is returned in 0.075 seconds.

      There are queries which are using the same date_sub structure which appear to perform a full scan of the table

      I expect the results of the date_sub function are not classed equivalent to the literal even though there is nothing in the date_sub function which will change during the course of the query.

      Explaining the two queries:

      explain select count from event_histories eh where eh.adjusted_activity_date_utc > date_sub(to_date(from_unixtime(unix_timestamp())), 7);
      OK
      Explain
      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 depends on stages: Stage-1

      STAGE PLANS:
      Stage: Stage-1
      Map Reduce
      Map Operator Tree:
      TableScan
      alias: eh
      Statistics: Num rows: 1993093006 Data size: 901863505920 Basic stats: PARTIAL Column stats: NONE
      Filter Operator
      predicate: (adjusted_activity_date_utc > date_sub(to_date(from_unixtime(unix_timestamp())), 7)) (type: boolean)
      Statistics: Num rows: 664364335 Data size: 300621168489 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      Statistics: Num rows: 664364335 Data size: 300621168489 Basic stats: COMPLETE Column stats: NONE
      Group By Operator
      aggregations: count()
      mode: hash
      outputColumnNames: _col0
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
      Reduce Output Operator
      sort order:
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
      value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
      Group By Operator
      aggregations: count(VALUE._col0)
      mode: mergepartial
      outputColumnNames: _col0
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
      Select Operator
      expressions: _col0 (type: bigint)
      outputColumnNames: _col0
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
      File Output Operator
      compressed: false
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
      table:
      input format: org.apache.hadoop.mapred.TextInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

      Stage: Stage-0
      Fetch Operator
      limit: -1
      Processor Tree:
      ListSink

      Time taken: 1.853 seconds, Fetched: 49 row(s)

      explain select count from event_histories eh where eh.adjusted_activity_date_utc > '2015-02-02';
      OK
      Explain
      STAGE DEPENDENCIES:
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-0
      Fetch Operator
      limit: 1
      Processor Tree:
      ListSink

      Time taken: 0.06 seconds, Fetched: 10 row(s)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Humanavatar Robert Miller
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: