Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11827

Speed-up distcp buildListing() using threadpool

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.0, 2.7.1
    • 2.8.0, 3.0.0-alpha1
    • tools/distcp
    • None

    Description

      For very large source trees on s3 distcp is taking long time to build file listing (client code, before starting mappers). For a dataset I used (1.5M files, 50K dirs) it was taking 65 minutes before my fix in HADOOP-11785 and 36 minutes after the fix).

      Attachments

        1. HADOOP-11827.patch
          37 kB
          Zoran Dimitrijevic
        2. HADOOP-11827-02.patch
          37 kB
          Zoran Dimitrijevic
        3. HADOOP-11827-03.patch
          38 kB
          Zoran Dimitrijevic
        4. HADOOP-11827-04.patch
          38 kB
          Ravi Prakash

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            3opan Zoran Dimitrijevic
            3opan Zoran Dimitrijevic
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment