Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12678

BETWEEN relational operator sometimes returns incorrect results against PARQUET tables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Information Provided
    • Affects Version/s: 1.1.0, 1.2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When querying a parquet table, the BETWEEN relational operator returns incorrect results when hive.optimize.index.filter and hive.optimize.ppd.storage are enabled

      Create a parquet table:

      create table t(c string) stored as parquet;
      

      Insert some strings representing dates

      insert into t select '2015-12-09' from default.dual limit 1;
      insert into t select '2015-12-10' from default.dual limit 1;
      insert into t select '2015-12-11' from default.dual limit 1;
      

      Example 1

      This query correctly returns 3:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 3    |
      +------+--+
      

      This query incorrectly returns 1:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c between '2015-12-09' and '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 3:

      set hive.optimize.index.filter=false;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c between '2015-12-09' and '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 3    |
      +------+--+
      

      Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 3:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=false;
      select count(*) from t where c between '2015-12-09' and '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 3    |
      +------+--+
      

      Example 2

      This query correctly returns 1:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c >=  '2015-12-10' and c <= '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      This query incorrectly returns 0:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c between '2015-12-10' and '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 0    |
      +------+--+
      

      Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 1:

      set hive.optimize.index.filter=false;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 1:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=false;
      select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ctang Chaoyu Tang
                Reporter:
                nbrenwald Nicholas Brenwald
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: