Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5085

JobClient reorders splits

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The JobClient hard codes ordering of splits in descending size. While this could be fine for traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested in the order of map executions. More over, by constantly running more expensive mappers early in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time.

      ...JobClient.java
        private <T extends InputSplit>
        int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
            InterruptedException, ClassNotFoundException {
      ....
          // sort the splits into order based on size, so that the biggest
          // go first
          Arrays.sort(array, new SplitComparator());
          JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf), array);
          return array.length;
        }
      

      It should be straightforward to make the SplitComparator an instance variable of the JobClient and allow it to be set by the consumers if they care about the order in which splits are attempted to run.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ledion ledion bitincka
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: