Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1440

JobClient should not sort input-splits

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.12.3
    • 0.14.0
    • None
    • None
    • All

    Description

      Currently, the JobClient sorts the InputSplits returned by InputFormat in descending order, so that the map tasks corresponding to larger input-splits are scheduled first for execution than smaller ones. However, this causes problems in applications that produce data-sets partitioned similarly to the input partition with -reducer NONE.

      With -reducer NONE, map task i produces part-i. Howver, in the typical applications that use -reducer NONE it should produce a partition that has the same index as the input parrtition.

      (Of course, this requires that each partition should be fed in its entirety to a map, rather than splitting it into blocks, but that is a separate issue.)

      Thus, sorting input splits should be either controllable via a configuration variable, or the FileInputFormat should sort the splits and JobClient should honor the order of splits.

      Attachments

        1. HADOOP-1440_1.patch
          10 kB
          Senthil Subramanian

        Issue Links

          Activity

            People

              senthil Senthil Subramanian
              milindb Milind Barve
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: