Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15209

DistCp to eliminate needless deletion of files under already-deleted directories

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0
    • 3.1.0
    • tools/distcp
    • None

    Description

      DistCP issues a delete(file) request even if is underneath an already deleted directory. This generates needless load on filesystems/object stores, and, if the store throttles delete, can dramatically slow down the delete operation.

      If the distcp delete operation can build a history of deleted directories, then it will know when it does not need to issue those deletes.

      Care is needed here to make sure that whatever structure is created does not overload the heap of the process.

      Attachments

        1. HADOOP-15209-001.patch
          31 kB
          Steve Loughran
        2. HADOOP-15209-002.patch
          85 kB
          Steve Loughran
        3. HADOOP-15209-003.patch
          85 kB
          Steve Loughran
        4. HADOOP-15209-004.patch
          87 kB
          Steve Loughran
        5. HADOOP-15209-005.patch
          88 kB
          Steve Loughran
        6. HADOOP-15209-006.patch
          88 kB
          Steve Loughran
        7. HADOOP-15209-007.patch
          92 kB
          Steve Loughran

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stevel@apache.org Steve Loughran
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment