Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20699 Query based compactor for full CRUD Acid tables
  3. HIVE-22474

Query based major compaction always creates only one bucket file

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      set hive.execution.engine=mr;
      drop table if exists tbl2;
      create table tbl2 (a int, b int) clustered by (a) into 2 buckets stored as ORC TBLPROPERTIES('bucketing_version'='2', 'transactional'='true', 'compactorthreshold.hive.compactor.delta.num.threshold'='3');
      insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
      insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
      delete from tbl2 where b = 2;
      insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);
      delete from tbl2 where a = 1;
      

      Having the above use case, at the end of the major compaction the base directory contains only one bucket file, although the table is bucketed in 2 buckets. Before running the compaction, the delta directories contains the right amount of bucket files, and the data is split accordingly. 

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lpinter László Pintér
                Reporter:
                lpinter László Pintér
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: