Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-2105

Ensure the destination path does not exist before renaming during Gobblin compaction.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • gobblin-compaction
    • None

    Description

      As part of Gobblin compaction (deduplication), compacted files are moved from staging to their final location at the end of the process. This movement is handled by the org.apache.gobblin.compaction.action.CompactionCompleteFileOperationAction#onCompactionJobComplete method, which determines the appropriate destination path and moves the compacted files accordingly.

      Current Issue:

      • If the flag compaction.rename.source.dir.enabled is set to false (not in append mode) and recompaction.write.to.new.folder is set to true, a new directory is determined based on the execution count derived from the state file.
      • The state file, however, is generated after the move to the final location. If there are any failures during this move, the state file will be incorrect.
      • In the next execution, the determined destination path might already exist. This will cause the rename operation to create an additional child directory, as is the behavior of HDFS rename when the destination directory already exists.

      Requirement:

      We need to ensure that the destination path determined must not exist before the rename operation.

      Attachments

        Activity

          People

            ibuenros Issac Buenrostro
            arpit.varshney Arpit Varshney
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m