Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22639

Bucket file name does not match bucket id after query based major compaction

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Do
    • 3.1.0, 3.0.0
    • None
    • Hive
    • None

    Description

      While debugging
      TestCrudCompactorOnTez#testCompactionWithSchemaEvolutionAndBuckets(), it has come to my attention, that even though before compaction, the file name of the single bucket in the delta directories is bucket_00001, in the new base, the name of the new single bucket file is bucket_00000. At the same time, the bucket value in the ROW__ID of the records remain the same and suggest that the bucket id is 1.
      So the bucket id and the file name do not match. This could lead to problems.

      The test itself does not reveal this issue, although I think that the tests should check this, too. At the same time, the tests assume the exact bucket id value in cases where it cannot be predicted and fail, even though the bucket it does not change after the compaction, so the check should really pass.

      Attachments

        1. HIVE-22639.patch
          12 kB
          Aron Hamvas
        2. HIVE-22639.1.patch
          12 kB
          Aron Hamvas
        3. HIVE-22639.2.patch
          12 kB
          Aron Hamvas

        Activity

          People

            hamvas.aron Aron Hamvas
            hamvas.aron Aron Hamvas
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: