Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20183

Inserting from bucketed table can cause data loss, if the source table contains empty buckets

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: Operators
    • Labels:
      None

      Description

      Could be reproduced by the following:

      set hive.enforce.bucketing=true;
      set hive.enforce.sorting=true;
      set hive.optimize.bucketingsorting=true;
      
      create table bucket1 (id int, val string) clustered by (id) sorted by (id ASC) INTO 4 BUCKETS;
      insert into bucket1 values (1, 'abc'), (3, 'abc');
      select * from bucket1;
      
      +-------------+--------------+
      | bucket1.id  | bucket1.val  |
      +-------------+--------------+
      | 3           | abc          |
      | 1           | abc          |
      +-------------+--------------+
      
      create table bucket2 like bucket1;
      
      insert overwrite table bucket2 select * from bucket1;
      select * from bucket2;
      
      +-------------+--------------+
      | bucket2.id  | bucket2.val  |
      +-------------+--------------+
      | 1           | abc          |
      +-------------+--------------+
      
      

        Attachments

        1. HIVE-20183.2.patch
          7 kB
          Peter Vary
        2. HIVE-20183.patch
          0.9 kB
          Peter Vary

          Issue Links

            Activity

              People

              • Assignee:
                pvary Peter Vary
                Reporter:
                pvary Peter Vary
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: