Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23114

Insert overwrite with dynamic partitioning is not working correctly with direct insert

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • None
    • None

    Description

      This is a follow-up Jira for the conversation inĀ HIVE-21164
      Doing an insert overwrite from a multi-insert statement with dynamic partitioning will give wrong results for ACID tables when 'hive.acid.direct.insert.enabled' is true or for insert-only tables.

      Reproduction:

      set hive.acid.direct.insert.enabled=true;
      set hive.support.concurrency=true;
      set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
      set hive.vectorized.execution.enabled=false;
      set hive.stats.autogather=false;
      
      create external table multiinsert_test_text (a int, b int, c int) stored as textfile;
      insert into multiinsert_test_text values (1111, 11, 1111), (2222, 22, 1111), (3333, 33, 2222), (4444, 44, NULL), (5555, 55, NULL);
      
      create table multiinsert_test_acid (a int, b int) partitioned by (c int) stored as orc tblproperties('transactional'='true');
      create table multiinsert_test_mm (a int, b int) partitioned by (c int) stored as orc tblproperties('transactional'='true', 'transactional_properties'='insert_only');
      
      from multiinsert_test_text a
      insert overwrite table multiinsert_test_acid partition (c)
      select
       a.a,
       a.b,
       a.c
       where a.c is not null
      insert overwrite table multiinsert_test_acid partition (c)
      select
       a.a,
       a.b,
       a.c
      where a.c is null;
      select * from multiinsert_test_acid;
      
      from multiinsert_test_text a
      insert overwrite table multiinsert_test_mm partition (c)
      select
       a.a,
       a.b,
       a.c
       where a.c is not null
      insert overwrite table multiinsert_test_mm partition (c)
      select
       a.a,
       a.b,
       a.c
      where a.c is null;
      select * from multiinsert_test_mm;
      

      The result of these steps can be different, it depends on the execution order of the FileSinkOperators of the insert overwrite statements. It can happen that an error occurs due to manifest file collision, it can happen that no error occurs but the result will be incorrect.
      Running the same insert query with an external table of with and ACID table with 'hive.acid.direct.insert.enabled=false' will give the follwing result:

      1111    11      1111
      2222    22      1111
      3333    33      2222
      4444    44      NULL
      5555    55      NULL
      

      Attachments

        1. HIVE-23114.1.patch
          111 kB
          Marta Kuczora
        2. HIVE-23114.2.patch
          219 kB
          Marta Kuczora
        3. HIVE-23114.3.patch
          219 kB
          Marta Kuczora
        4. HIVE-23114.3.patch
          219 kB
          Marta Kuczora

        Issue Links

          Activity

            People

              kuczoram Marta Kuczora
              kuczoram Marta Kuczora
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: