Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17531

DistCp: Reduce memory usage on copying huge directories

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.1, 3.4.0
    • Component/s: None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added a -useiterator option in distcp which uses listStatusIterator for building the listing. Primarily to reduce memory usage at client for building listing.

      Description

      Presently distCp, uses the producer-consumer kind of setup while building the listing, the input queue and output queue are both unbounded, thus the listStatus grows quite huge.

      Rel Code Part :

      https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635

      This goes on bredth-first traversal kind of stuff(uses queue instead of earlier stack), so if you have files at lower depth, it will like open up the entire tree and the start processing....

        Attachments

        1. gc-NewD-512M-3.8ML.log
          34 kB
          Ayush Saxena
        2. MoveToStackIterator.patch
          5 kB
          Ayush Saxena

          Issue Links

            Activity

              People

              • Assignee:
                ayushtkn Ayush Saxena
                Reporter:
                ayushtkn Ayush Saxena
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10h 20m
                  10h 20m