Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.9.0
-
None
Description
DistCP issues a delete(file) request even if is underneath an already deleted directory. This generates needless load on filesystems/object stores, and, if the store throttles delete, can dramatically slow down the delete operation.
If the distcp delete operation can build a history of deleted directories, then it will know when it does not need to issue those deletes.
Care is needed here to make sure that whatever structure is created does not overload the heap of the process.
Attachments
Attachments
Issue Links
- contains
-
HADOOP-15208 DistCp to offer -xtrack <path> option to save src/dest filesets as alternative to delete()
- Resolved
- duplicates
-
HADOOP-15208 DistCp to offer -xtrack <path> option to save src/dest filesets as alternative to delete()
- Resolved
- is related to
-
HADOOP-15292 Distcp's use of pread is slowing it down.
- Resolved
-
HADOOP-15281 Distcp to add no-rename copy option
- Resolved
- supercedes
-
HADOOP-15191 Add Private/Unstable BulkDelete operations to supporting object stores for DistCP
- Resolved