Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13169

Randomize file list in SimpleCopyListing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.8.0
    • tools/distcp
    • None
    • Reviewed

    Description

      When copying files to S3, based on file listing some mappers can get into S3 partition hotspots. This would be more visible, when data is copied from hive warehouse with lots of partitions (e.g date partitions). In such cases, some of the tasks would tend to be a lot more slower than others. It would be good to randomize the file paths which are written out in SimpleCopyListing to avoid this issue.

      Attachments

        1. HADOOP-13169-branch-2-010.patch
          15 kB
          Rajesh Balamohan
        2. HADOOP-13169-branch-2-009.patch
          15 kB
          Rajesh Balamohan
        3. HADOOP-13169-branch-2-008.patch
          14 kB
          Rajesh Balamohan
        4. HADOOP-13169-branch-2-007.patch
          13 kB
          Rajesh Balamohan
        5. HADOOP-13169-branch-2-006.patch
          14 kB
          Rajesh Balamohan
        6. HADOOP-13169-branch-2-005.patch
          12 kB
          Rajesh Balamohan
        7. HADOOP-13169-branch-2-004.patch
          8 kB
          Rajesh Balamohan
        8. HADOOP-13169-branch-2-003.patch
          8 kB
          Rajesh Balamohan
        9. HADOOP-13169-branch-2-002.patch
          6 kB
          Rajesh Balamohan
        10. HADOOP-13169-branch-2-001.patch
          7 kB
          Rajesh Balamohan

        Issue Links

          Activity

            People

              rajesh.balamohan Rajesh Balamohan
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: