Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13377

Lost rows when using compact index on parquet table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.1.0
    • None
    • Indexing
    • None
    • linux, cdh 5.5.0

    Description

      Query with where clause on a parquet table loses rows when using a compact index. The query produces the right results without the index.

      create table small_parq(i int) stored as parquet;
      
      insert into table small_parq values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11);
      
      set hive.optimize.index.filter=true;
      set hive.optimize.index.filter.compact.minsize=50;
      
      create index  comp_idx on table small_parq (i) as 'compact' WITH DEFERRED REBUILD;
      alter index comp_idx on small_parq rebuild;
      
      select * from small_parq where i=3;
      --this correctly produces 1 row (value 3).
      
      select * from small_parq where i=11;
      --this incorrectly produces 0 rows.
      
      --I see correct results when looking for a row in [1,6];
      --I see bad results when looking for a row in [7,11].
      
      --All is well once I disable the compact index
      set hive.optimize.index.filter.compact.minsize=50000000;
      select * from small_parq where i=11;
      --now it correctly produces 1 row (value 11).
      

      It seems I can't reproduce this issue if the base table was ORC, SEQ, AVRO, TEXTFILE.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gabriel.balan Gabriel C Balan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: