Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2048

reduce overhead of sorting jobs/pools in FairScheduler heartbeat processing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • contrib/fair-share
    • None

    Description

      We are bound on the JT by the jobtracker lock. Sorting of jobs (and pools in hadoop-trunk) done by the FairScheduler is done once per heartbeat while this lock is held. This shows up as one of the places where we spend a lot of time holding the jobtracker lock.

      We can avoid sorting the jobs/pools per heartbeat - and instead do a sort in the updateThread (which is invoked periodically). The sorted set can be maintained incrementally (as jobs/pools are scheduled in each heartbeat - one can delete/insert into the sortedset).

      This may be less of an issue in trunk (as we sort pools and then sort jobs within a pool) as opposed to hadoop-20 (where we sort all jobs). however - in our workload - we have lots of pools (one per user) and lots of jobs in some pools (production pools) - so i think it's reasonable to assume that this is worth addressing in trunk as well.

      Attachments

        Activity

          People

            jsensarma Joydeep Sen Sarma
            jsensarma Joydeep Sen Sarma
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: