Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20183

Inserting from bucketed table can cause data loss, if the source table contains empty buckets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • Operators
    • None

    Description

      Could be reproduced by the following:

      set hive.enforce.bucketing=true;
      set hive.enforce.sorting=true;
      set hive.optimize.bucketingsorting=true;
      
      create table bucket1 (id int, val string) clustered by (id) sorted by (id ASC) INTO 4 BUCKETS;
      insert into bucket1 values (1, 'abc'), (3, 'abc');
      select * from bucket1;
      
      +-------------+--------------+
      | bucket1.id  | bucket1.val  |
      +-------------+--------------+
      | 3           | abc          |
      | 1           | abc          |
      +-------------+--------------+
      
      create table bucket2 like bucket1;
      
      insert overwrite table bucket2 select * from bucket1;
      select * from bucket2;
      
      +-------------+--------------+
      | bucket2.id  | bucket2.val  |
      +-------------+--------------+
      | 1           | abc          |
      +-------------+--------------+
      
      

      Attachments

        1. HIVE-20183.2.patch
          7 kB
          Peter Vary
        2. HIVE-20183.patch
          0.9 kB
          Peter Vary

        Issue Links

          Activity

            People

              pvary Peter Vary
              pvary Peter Vary
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: