Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13169

Randomize file list in SimpleCopyListing

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.8.0
    • tools/distcp
    • None
    • Reviewed

    Description

      When copying files to S3, based on file listing some mappers can get into S3 partition hotspots. This would be more visible, when data is copied from hive warehouse with lots of partitions (e.g date partitions). In such cases, some of the tasks would tend to be a lot more slower than others. It would be good to randomize the file paths which are written out in SimpleCopyListing to avoid this issue.

      Attachments

        1. HADOOP-13169-branch-2-010.patch
          15 kB
          Rajesh Balamohan
        2. HADOOP-13169-branch-2-009.patch
          15 kB
          Rajesh Balamohan
        3. HADOOP-13169-branch-2-008.patch
          14 kB
          Rajesh Balamohan
        4. HADOOP-13169-branch-2-007.patch
          13 kB
          Rajesh Balamohan
        5. HADOOP-13169-branch-2-006.patch
          14 kB
          Rajesh Balamohan
        6. HADOOP-13169-branch-2-005.patch
          12 kB
          Rajesh Balamohan
        7. HADOOP-13169-branch-2-004.patch
          8 kB
          Rajesh Balamohan
        8. HADOOP-13169-branch-2-003.patch
          8 kB
          Rajesh Balamohan
        9. HADOOP-13169-branch-2-002.patch
          6 kB
          Rajesh Balamohan
        10. HADOOP-13169-branch-2-001.patch
          7 kB
          Rajesh Balamohan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rajesh.balamohan Rajesh Balamohan Assign to me
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment