Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16177

non Acid to acid conversion doesn't handle _copy_N files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 2.4.0, 3.0.0
    • Component/s: Transactions
    • Labels:
      None

      Description

      create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc TBLPROPERTIES('transactional'='false')
      insert into T(a,b) values(1,2)
      insert into T(a,b) values(1,3)
      alter table T SET TBLPROPERTIES ('transactional'='true')
      

      //we should now have bucket files 000001_0 and 000001_0_copy_1

      but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can be copy_N files and numbers rows in each bucket from 0 thus generating duplicate IDs

      select ROW__ID, INPUT__FILE__NAME, a, b from T
      

      produces

      {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0,1,2
      {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0_copy_1,1,3
      

      [~owen.omalley], do you have any thoughts on a good way to handle this?

      attached patch has a few changes to make Acid even recognize copy_N but this is just a pre-requisite. The new UT demonstrates the issue.

      Futhermore,

      alter table T compact 'major'
      select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
      

      produces

      {"transactionid":0,"bucketid":1,"rowid":0}	file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands....warehouse/nonacidorctbl/base_-9223372036854775808/bucket_00001	1	2
      

      HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() demonstrating this

      This is because compactor doesn't handle copy_N files either (skips them)

        Attachments

        1. HIVE-16177.01.patch
          8 kB
          Eugene Koifman
        2. HIVE-16177.02.patch
          9 kB
          Eugene Koifman
        3. HIVE-16177.04.patch
          43 kB
          Eugene Koifman
        4. HIVE-16177.07.patch
          79 kB
          Eugene Koifman
        5. HIVE-16177.08.patch
          85 kB
          Eugene Koifman
        6. HIVE-16177.09.patch
          97 kB
          Eugene Koifman
        7. HIVE-16177.10.patch
          96 kB
          Eugene Koifman
        8. HIVE-16177.11.patch
          96 kB
          Eugene Koifman
        9. HIVE-16177.14.patch
          74 kB
          Eugene Koifman
        10. HIVE-16177.15.patch
          100 kB
          Eugene Koifman
        11. HIVE-16177.16.patch
          100 kB
          Eugene Koifman
        12. HIVE-16177.17.patch
          96 kB
          Eugene Koifman
        13. HIVE-16177.18.patch
          96 kB
          Eugene Koifman
        14. HIVE-16177.18-branch-2.patch
          95 kB
          Eugene Koifman
        15. HIVE-16177.19-branch-2.patch
          97 kB
          Eugene Koifman
        16. HIVE-16177.20-branch-2.patch
          97 kB
          Eugene Koifman

          Issue Links

            Activity

              People

              • Assignee:
                ekoifman Eugene Koifman
                Reporter:
                ekoifman Eugene Koifman
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: