Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12678

BETWEEN relational operator sometimes returns incorrect results against PARQUET tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Information Provided
    • 1.1.0, 1.2.1
    • None
    • None
    • None

    Description

      When querying a parquet table, the BETWEEN relational operator returns incorrect results when hive.optimize.index.filter and hive.optimize.ppd.storage are enabled

      Create a parquet table:

      create table t(c string) stored as parquet;
      

      Insert some strings representing dates

      insert into t select '2015-12-09' from default.dual limit 1;
      insert into t select '2015-12-10' from default.dual limit 1;
      insert into t select '2015-12-11' from default.dual limit 1;
      

      Example 1

      This query correctly returns 3:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 3    |
      +------+--+
      

      This query incorrectly returns 1:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c between '2015-12-09' and '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 3:

      set hive.optimize.index.filter=false;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c between '2015-12-09' and '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 3    |
      +------+--+
      

      Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 3:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=false;
      select count(*) from t where c between '2015-12-09' and '2015-12-11';
      +------+--+
      | _c0  |
      +------+--+
      | 3    |
      +------+--+
      

      Example 2

      This query correctly returns 1:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c >=  '2015-12-10' and c <= '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      This query incorrectly returns 0:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c between '2015-12-10' and '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 0    |
      +------+--+
      

      Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 1:

      set hive.optimize.index.filter=false;
      set hive.optimize.ppd.storage=true;
      select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 1:

      set hive.optimize.index.filter=true;
      set hive.optimize.ppd.storage=false;
      select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10';
      +------+--+
      | _c0  |
      +------+--+
      | 1    |
      +------+--+
      

      Attachments

        Issue Links

          Activity

            People

              ctang Chaoyu Tang
              nbrenwald Nicholas Brenwald
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: