Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22639

Bucket file name does not match bucket id after query based major compaction

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Do
    • Affects Version/s: 3.1.0, 3.0.0
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      While debugging
      TestCrudCompactorOnTez#testCompactionWithSchemaEvolutionAndBuckets(), it has come to my attention, that even though before compaction, the file name of the single bucket in the delta directories is bucket_00001, in the new base, the name of the new single bucket file is bucket_00000. At the same time, the bucket value in the ROW__ID of the records remain the same and suggest that the bucket id is 1.
      So the bucket id and the file name do not match. This could lead to problems.

      The test itself does not reveal this issue, although I think that the tests should check this, too. At the same time, the tests assume the exact bucket id value in cases where it cannot be predicted and fail, even though the bucket it does not change after the compaction, so the check should really pass.

        Attachments

        1. HIVE-22639.1.patch
          12 kB
          Aron Hamvas
        2. HIVE-22639.2.patch
          12 kB
          Aron Hamvas
        3. HIVE-22639.patch
          12 kB
          Aron Hamvas

          Activity

            People

            • Assignee:
              hamvas.aron Aron Hamvas
              Reporter:
              hamvas.aron Aron Hamvas
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: