Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It should be possible to specify a limit to the number of tasks per job permitted to run simultaneously. If, for example, you have a cluster of 50 nodes, with 100 map task slots and 100 reduce task slots, and the configured limit is 25 simultaneous tasks/job, then four or more jobs will be able to run at a time. This will permit short jobs to pass longer-running jobs. This also avoids some problems we've seen with HOD, where nodes are underutilized in their tail, and it should permit improved input locality.

        Issue Links

          Activity

          Hide
          aw Allen Wittenauer added a comment -

          I'm going to close this out as a duplicate of MAPREDEUCE-5583.

          Show
          aw Allen Wittenauer added a comment - I'm going to close this out as a duplicate of MAPREDEUCE-5583.
          Hide
          tomwhite Tom White added a comment -

          I think this is covered by HADOOP-5170. If so, we can close this issue as a duplicate.

          Show
          tomwhite Tom White added a comment - I think this is covered by HADOOP-5170 . If so, we can close this issue as a duplicate.
          Hide
          un_brice Brice Arnould added a comment -

          The fix for bug 3412 also fix this one

          Show
          un_brice Brice Arnould added a comment - The fix for bug 3412 also fix this one
          Hide
          cutting Doug Cutting added a comment -

          > The limit could be max(static_limit, number of cores in cluster / # active jobs)

          Jinx!

          Show
          cutting Doug Cutting added a comment - > The limit could be max(static_limit, number of cores in cluster / # active jobs) Jinx!
          Hide
          tdunning@veoh.com Ted Dunning added a comment - - edited

          (oops... yes, doug anticipated this in his comment and I didn't read very well)

          Presumably the limit could be made dynamic. The limit could be max(static_limit, number of cores in cluster / # active jobs)

          On further reflection, I should note that my big jobs are all limited in pretty much the way that Doug suggests because they are processing a few large files that are unsplittable. This limits the number of slots these big jobs can eat up.

          The result is pretty OK. My little jobs with lots of maps can slide through the cracks most of the time and everything runs pretty well.

          Show
          tdunning@veoh.com Ted Dunning added a comment - - edited (oops... yes, doug anticipated this in his comment and I didn't read very well) Presumably the limit could be made dynamic. The limit could be max(static_limit, number of cores in cluster / # active jobs) On further reflection, I should note that my big jobs are all limited in pretty much the way that Doug suggests because they are processing a few large files that are unsplittable. This limits the number of slots these big jobs can eat up. The result is pretty OK. My little jobs with lots of maps can slide through the cracks most of the time and everything runs pretty well.
          Hide
          cutting Doug Cutting added a comment -

          I think a static limit for all jobs would be useful and best to implement first. After some experience with this, we would be better able to address its shortcomings. Possible future extensions might be:

          • dynamically altering the limit, e.g., limit=max(min.tasks.per.job, numSlots/numJobsOutstanding)
            • ramping up the limit slowly, so that a users's sequential jobs don't have all their slots immediately taken when one job completes
            • ramping down the limit slowly, so that tasks are given an opportunity to finish normally before they are killed.
          • incorporating job priority into the limit
          Show
          cutting Doug Cutting added a comment - I think a static limit for all jobs would be useful and best to implement first. After some experience with this, we would be better able to address its shortcomings. Possible future extensions might be: dynamically altering the limit, e.g., limit=max(min.tasks.per.job, numSlots/numJobsOutstanding) ramping up the limit slowly, so that a users's sequential jobs don't have all their slots immediately taken when one job completes ramping down the limit slowly, so that tasks are given an opportunity to finish normally before they are killed. incorporating job priority into the limit
          Hide
          cutting Doug Cutting added a comment -
          Show
          cutting Doug Cutting added a comment - Some discussion of this issue may be found at: http://www.nabble.com/question-about-file-glob-in-hadoop-0.15-tt14702242.html#a14741794
          Hide
          acmurthy Arun C Murthy added a comment -

          I'd like to throw job priority into this festering pool...

          At least changing the job-priority (done by the cluster-admin) should in a change in number of max_slots... thoughts?

          Show
          acmurthy Arun C Murthy added a comment - I'd like to throw job priority into this festering pool... At least changing the job-priority (done by the cluster-admin) should in a change in number of max_slots... thoughts?
          Hide
          cutting Doug Cutting added a comment -

          This addresses issues raised in HADOOP-2510.

          Show
          cutting Doug Cutting added a comment - This addresses issues raised in HADOOP-2510 .

            People

            • Assignee:
              Unassigned
              Reporter:
              cutting Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development