Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17531

DistCp: Reduce memory usage on copying huge directories

    XMLWordPrintableJSON

Details

    • Reviewed
    • Added a -useiterator option in distcp which uses listStatusIterator for building the listing. Primarily to reduce memory usage at client for building listing.

    Description

      Presently distCp, uses the producer-consumer kind of setup while building the listing, the input queue and output queue are both unbounded, thus the listStatus grows quite huge.

      Rel Code Part :

      https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635

      This goes on bredth-first traversal kind of stuff(uses queue instead of earlier stack), so if you have files at lower depth, it will like open up the entire tree and the start processing....

      Attachments

        1. MoveToStackIterator.patch
          5 kB
          Ayush Saxena
        2. gc-NewD-512M-3.8ML.log
          34 kB
          Ayush Saxena

        Issue Links

          Activity

            People

              ayushtkn Ayush Saxena
              ayushtkn Ayush Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10h 20m
                  10h 20m