Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As part of Gobblin compaction (deduplication), compacted files are moved from staging to their final location at the end of the process. This movement is handled by the org.apache.gobblin.compaction.action.CompactionCompleteFileOperationAction#onCompactionJobComplete method, which determines the appropriate destination path and moves the compacted files accordingly.
Current Issue:
- If the flag compaction.rename.source.dir.enabled is set to false (not in append mode) and recompaction.write.to.new.folder is set to true, a new directory is determined based on the execution count derived from the state file.
- The state file, however, is generated after the move to the final location. If there are any failures during this move, the state file will be incorrect.
- In the next execution, the determined destination path might already exist. This will cause the rename operation to create an additional child directory, as is the behavior of HDFS rename when the destination directory already exists.
Requirement:
We need to ensure that the destination path determined must not exist before the rename operation.