Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16832

duplicate ROW__ID possible in multi insert into transactional table

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.2.0
    • 3.0.0
    • Transactions
    • None

    Description

       create table AcidTablePart(a int, b int) partitioned by (p string) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
       create temporary table if not exists data1 (x int);
       insert into data1 values (1);
       from data1
         insert into AcidTablePart partition(p) select 0, 0, 'p' || x
         insert into AcidTablePart partition(p='p1') select 0, 1
      

      Each branch of this multi-insert create a row in partition p1/bucket0 with ROW__ID=(1,0,0).
      The same can happen when running SQL Merge (HIVE-10924) statement that has both Insert and Update clauses when target table has 'transactional'='true','transactional_properties'='default' (see HIVE-14035). This is so because Merge is internally run as a multi-insert statement.

      The solution relies on statement ID introduced in HIVE-11030. Each Insert clause of a multi-insert is gets a unique ID.
      The ROW__ID.bucketId now becomes a bit packed triplet (format version, bucketId, statementId).
      (Since ORC stores field names in the data file we can't rename ROW__ID.bucketId).
      This ensures that there are no collisions and retains desired sort properties of ROW__ID.
      In particular SortedDynPartitionOptimizer works w/o any changes even in cases where there fewer reducers than buckets.

      Attachments

        1. HIVE-16832.22.patch
          127 kB
          Eugene Koifman
        2. HIVE-16832.21.patch
          121 kB
          Eugene Koifman
        3. HIVE-16832.20.patch
          121 kB
          Eugene Koifman
        4. HIVE-16832.20.patch
          121 kB
          Eugene Koifman
        5. HIVE-16832.19.patch
          117 kB
          Eugene Koifman
        6. HIVE-16832.18.patch
          105 kB
          Eugene Koifman
        7. HIVE-16832.17.patch
          100 kB
          Eugene Koifman
        8. HIVE-16832.16.patch
          95 kB
          Eugene Koifman
        9. HIVE-16832.15.patch
          87 kB
          Eugene Koifman
        10. HIVE-16832.14.patch
          34 kB
          Eugene Koifman
        11. HIVE-16832.11.patch
          227 kB
          Eugene Koifman
        12. HIVE-16832.10.patch
          128 kB
          Eugene Koifman
        13. HIVE-16832.09.patch
          114 kB
          Eugene Koifman
        14. HIVE-16832.08.patch
          113 kB
          Eugene Koifman
        15. HIVE-16832.06.patch
          95 kB
          Eugene Koifman
        16. HIVE-16832.05.patch
          95 kB
          Eugene Koifman
        17. HIVE-16832.04.patch
          84 kB
          Eugene Koifman
        18. HIVE-16832.03.patch
          78 kB
          Eugene Koifman
        19. HIVE-16832.01.patch
          32 kB
          Eugene Koifman

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ekoifman Eugene Koifman Assign to me
            ekoifman Eugene Koifman
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment