Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-544

deficit computation is biased by historical load



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      going through the code over the weekend - one of the things that struck me about the fair scheduler was how the deficit accumulated was dependent on number of concurrent tasks and how that can hurt other tasks later on.

      for example:

      • two jobs are running, one came before and is occupying all the slots. so the second job accumulates deficit at the rate of half the cluster size until some time. this can be a fairly high deficit (since only two tasks are running)
      • a large number of jobs now arrive. the second job, by virtue of it's high deficit, out-schedules the new wave of jobs.

      what i dont like here is that deficit is based largely on the history of the jobs in the cluster - and the current set of jobs has had no influence on that history - and may be unduly penalized for it. jobs that happen to be pending when there are a low number of jobs in the system - benefit a lot.

      what i find more helpful is what the deficit is in the context of the current load. for example - let's say that a job has used 'C' compute units (slots * time) over it's lifetime of T. ie. it has effectively been using C/T slots over its lifetime. at any given point in time - it's fair share may be an alternative number F (based on jobs running at that point in time). So one can now compute deficit as (F - C/T)*T as of this point in time.

      clearly - this would address the scenario outlined positively by not granting a large deficit to the second large job once the wave of jobs arrives. consider another alternative (and corollary) scenario:

      • large job running with lots of other small jobs
      • the small job finishes and another large job arrives (so two large jobs running now)

      in this case also - the current algorithm does not do well. the first large job does not accumulate a lot of deficit if fair sharing is working well. once the small jobs disappear and the second large job comes in - both the first and second large job get roughly 50-50 of the cluster. but this is counter-intuitive. in such a case - one would think that the first large job - by virtue of running longer at a slot pace - should get more slots than the second large job.

      again - recomputing deficit as mentioned in the context of the current load would fix this situation.




            Unassigned Unassigned
            jsensarma Joydeep Sen Sarma
            0 Vote for this issue
            5 Start watching this issue