Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13377

Lost rows when using compact index on parquet table

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None
    • Environment:

      linux, cdh 5.5.0

      Description

      Query with where clause on a parquet table loses rows when using a compact index. The query produces the right results without the index.

      create table small_parq(i int) stored as parquet;
      
      insert into table small_parq values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11);
      
      set hive.optimize.index.filter=true;
      set hive.optimize.index.filter.compact.minsize=50;
      
      create index  comp_idx on table small_parq (i) as 'compact' WITH DEFERRED REBUILD;
      alter index comp_idx on small_parq rebuild;
      
      select * from small_parq where i=3;
      --this correctly produces 1 row (value 3).
      
      select * from small_parq where i=11;
      --this incorrectly produces 0 rows.
      
      --I see correct results when looking for a row in [1,6];
      --I see bad results when looking for a row in [7,11].
      
      --All is well once I disable the compact index
      set hive.optimize.index.filter.compact.minsize=50000000;
      select * from small_parq where i=11;
      --now it correctly produces 1 row (value 11).
      

      It seems I can't reproduce this issue if the base table was ORC, SEQ, AVRO, TEXTFILE.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gabriel.balan Gabriel C Balan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: