Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-537

Instrument events in the capacity scheduler for collecting metrics information

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We need to instrument various events in the capacity scheduler so that we can collect metrics about them. This data will help us determine improvements to scheduling strategies itself.

        Activity

        Hide
        rahul k singh added a comment -

        Following metrics would be collected.

        This would help in finding out no of heartbeat wasted.
        1. assignTasks returns null

        How many times mapscheduler is invoked vs how many times reduceScheduler is invoked
        2 .mapscheduler is invoked,
        3 .reduceScheduler is invoked,
        4 .Task scheduled from queue
        5 .Task scheduled from queue ignoring user limits
        6 High RAM job's task scheduled from queue

        Calculating skip count incase of following.
        7 Task skipped due to user limit exceeding with reason
        8 Task skipped due to high RAM jobs with reason

        9 Priority of job changed
        10 #of times initializer skips initializing pending jobs.

        status events.
        11 Job becomes running
        12 Job Added
        13 failed jobs in queue
        14 killed jobs in queue
        15 completed jobs in queue

        Queue statistics.
        16 running tasks in queue and per user
        17 pending tasks in queue
        18 failed/kill tasks in queue
        19 Amount of time queue is over capacity

        Show
        rahul k singh added a comment - Following metrics would be collected. This would help in finding out no of heartbeat wasted. 1. assignTasks returns null How many times mapscheduler is invoked vs how many times reduceScheduler is invoked 2 .mapscheduler is invoked, 3 .reduceScheduler is invoked, 4 .Task scheduled from queue 5 .Task scheduled from queue ignoring user limits 6 High RAM job's task scheduled from queue Calculating skip count incase of following. 7 Task skipped due to user limit exceeding with reason 8 Task skipped due to high RAM jobs with reason 9 Priority of job changed 10 #of times initializer skips initializing pending jobs. status events. 11 Job becomes running 12 Job Added 13 failed jobs in queue 14 killed jobs in queue 15 completed jobs in queue Queue statistics. 16 running tasks in queue and per user 17 pending tasks in queue 18 failed/kill tasks in queue 19 Amount of time queue is over capacity
        Hide
        rahul k singh added a comment -
        Show
        rahul k singh added a comment - This patch uses the https://issues.apache.org/jira/browse/MAPREDUCE-467 patch .
        Hide
        rahul k singh added a comment -

        There has been discussion to not to go ahead with the timewindow model which is part of the patch submitted. We would be using the metrics api to generate the metrics

        Show
        rahul k singh added a comment - There has been discussion to not to go ahead with the timewindow model which is part of the patch submitted. We would be using the metrics api to generate the metrics
        Hide
        Allen Wittenauer added a comment -

        I'm going to close this as stale.

        Show
        Allen Wittenauer added a comment - I'm going to close this as stale.

          People

          • Assignee:
            Unassigned
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development