Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16177 non Acid to acid conversion doesn't handle _copy_N files
  3. HIVE-16722

Converting bucketed non-acid table to acid should perform validation

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 3.0.0
    • Transactions
    • None

    Description

      Converting a non acid table to acid only performs metadata validation (in TransactionalValidationListener).
      The data read code path only understands certain directory layouts and file names and ignores (generally) files that don't match the expected format.

      In Hive, directory layout and bucket file naming (especially older releases) is poorly enforced.

      Need to add a validation step on

      alter table T SET TBLPROPERTIES ('transactional'='true')
      

      to
      scan the file system and report any possible data loss scenarios.

      Currently Acid understands bucket files name like "00000_0" and (with HIVE-16177) 00000_0_copy1" etc at the root of the partition.

      Attachments

        1. HIVE-16722.04.patch
          6 kB
          Eugene Koifman
        2. HIVE-16722.03.patch
          6 kB
          Eugene Koifman
        3. HIVE-16722.02.patch
          6 kB
          Eugene Koifman
        4. HIVE-16722.01.patch
          5 kB
          Eugene Koifman
        5. HIVE-16722.WIP.patch
          6 kB
          Eugene Koifman

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: