Hadoop Common
  1. Hadoop Common
  2. HADOOP-5733

Add map/reduce slot capacity and lost map/reduce slot capacity to JobTracker metrics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It would be nice to have the actual map/reduce slot capacity and the lost map/reduce slot capacity (# of blacklisted nodes * map-slot-per-node or reduce-slot-per-node). This information can be used to calculate a JT view of slot utilization.

      1. hadoop-5733-1.patch
        7 kB
        Sreekanth Ramakrishnan
      2. hadoop-5733-2.patch
        6 kB
        Sreekanth Ramakrishnan
      3. hadoop-5733-3.patch
        6 kB
        Sreekanth Ramakrishnan
      4. hadoop-5733-4.patch
        6 kB
        Chris Douglas
      5. hadoop-5733-v20.patch
        6 kB
        Robert Chansler

        Activity

        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch addressing this issue:

        Added following new fields:

        • map_slots : Number of Map slots in Cluster
        • reduce_slots : Number of reduce slots in cluster.
        • blacklisted_maps : Number of maps slots black listed.
        • blacklisted_reduces : Number of reduce slots black listed.

        Made changes in JobTracker to publish these metrics.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch addressing this issue: Added following new fields: map_slots : Number of Map slots in Cluster reduce_slots : Number of reduce slots in cluster. blacklisted_maps : Number of maps slots black listed. blacklisted_reduces : Number of reduce slots black listed. Made changes in JobTracker to publish these metrics.
        Hide
        Chris Douglas added a comment -

        Looks good

        For the map/reduce slots:

        • Instead of {add,dec}*Slots, consider adding set*Slots to the instrumentation and update with total*TaskCapacity (use MetricsRecord::setMetric)
        • Updates can occur outside the synchronized block in addHostCapacity and removeHostCapacity. With get/set, the field in the metrics can be volatile and updated without synchronizing on the instrumentation

        For the blacklisted slots:

        • The add/dec methods should be synchronized; there's a race condition with doUpdate
        Show
        Chris Douglas added a comment - Looks good For the map/reduce slots: Instead of {add,dec}*Slots, consider adding set*Slots to the instrumentation and update with total*TaskCapacity (use MetricsRecord::setMetric) Updates can occur outside the synchronized block in addHostCapacity and removeHostCapacity. With get/set, the field in the metrics can be volatile and updated without synchronizing on the instrumentation For the blacklisted slots: The add/dec methods should be synchronized; there's a race condition with doUpdate
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch incorporating the comment:

        • Changed map slot and reduce slot metric from incrMetric to setMetric
        • Changed the field holding, map slots and reduce slots to volatile, so the setters need not be synchronized.
        • The maps and reduce slot is set in updateTaskTrackerStatus in JobTracker
        • The setters for black listed slots have been made synchronized.
        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch incorporating the comment: Changed map slot and reduce slot metric from incrMetric to setMetric Changed the field holding, map slots and reduce slots to volatile, so the setters need not be synchronized. The maps and reduce slot is set in updateTaskTrackerStatus in JobTracker The setters for black listed slots have been made synchronized.
        Hide
        Chris Douglas added a comment -
        • When using setMetric, doUpdates shouldn't reset the metric to 0
        • set*Slots doesn't need to be adjusted in add/removeHostCapacity, as in the original patch?
        Show
        Chris Douglas added a comment - When using setMetric , doUpdates shouldn't reset the metric to 0 set*Slots doesn't need to be adjusted in add/removeHostCapacity, as in the original patch?
        Hide
        Sreekanth Ramakrishnan added a comment -

        *Not resetting the metrics field during doUpdates

        • Setting of the slots from add/removeHostCapacity have been removed because, in previous patches case, the map and reduce slots were incremental fields so, when ever the capacities were added/removed it was adjusted. Now, since it is static it is set whenever the TT statuses have updated the JT's internal capacity fields. But it is retained when tracker is marked blacklisted we increment/decrement in add/removeHostCapacity.
        Show
        Sreekanth Ramakrishnan added a comment - *Not resetting the metrics field during doUpdates Setting of the slots from add/removeHostCapacity have been removed because, in previous patches case, the map and reduce slots were incremental fields so, when ever the capacities were added/removed it was adjusted. Now, since it is static it is set whenever the TT statuses have updated the JT's internal capacity fields. But it is retained when tracker is marked blacklisted we increment/decrement in add/removeHostCapacity.
        Hide
        Chris Douglas added a comment -

        Merged with trunk, as conflicts with HADOOP-5738. Also re-added the blacklist metric resets to doUpdates, since incrMetric semantics still require it there.

             [exec] -1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no tests are needed for this patch.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Chris Douglas added a comment - Merged with trunk, as conflicts with HADOOP-5738 . Also re-added the blacklist metric resets to doUpdates , since incrMetric semantics still require it there. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Chris Douglas added a comment -

        I committed this. Thanks, Sreekanth

        Show
        Chris Douglas added a comment - I committed this. Thanks, Sreekanth
        Hide
        Chris Douglas added a comment -

        Setting of the slots from add/removeHostCapacity have been removed because, in previous patches case, the map and reduce slots were incremental fields so, when ever the capacities were added/removed it was adjusted. Now, since it is static it is set whenever the TT statuses have updated the JT's internal capacity fields. But it is retained when tracker is marked blacklisted we increment/decrement in add/removeHostCapacity.

        Sorry, I forgot to acknowledge this. Thanks for the explanation

        Show
        Chris Douglas added a comment - Setting of the slots from add/removeHostCapacity have been removed because, in previous patches case, the map and reduce slots were incremental fields so, when ever the capacities were added/removed it was adjusted. Now, since it is static it is set whenever the TT statuses have updated the JT's internal capacity fields. But it is retained when tracker is marked blacklisted we increment/decrement in add/removeHostCapacity. Sorry, I forgot to acknowledge this. Thanks for the explanation
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #827 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/827/)
        . Add map/reduce slot capacity and blacklisted capacity to JobTracker metrics. Contributed by Sreekanth Ramakrishnan

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #827 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/827/ ) . Add map/reduce slot capacity and blacklisted capacity to JobTracker metrics. Contributed by Sreekanth Ramakrishnan
        Hide
        Robert Chansler added a comment -

        Attached example for earlier version not to be committed.

        Show
        Robert Chansler added a comment - Attached example for earlier version not to be committed.
        Hide
        Starry Shi added a comment -

        It seems that the patch cannot be directly applied on 0.21.0 release. the source folder is changed on this release. also it cannot be applied on 0.20.1 and 0.20.2 release. i think it is better to specify on which release this patch should be applied.

        Show
        Starry Shi added a comment - It seems that the patch cannot be directly applied on 0.21.0 release. the source folder is changed on this release. also it cannot be applied on 0.20.1 and 0.20.2 release. i think it is better to specify on which release this patch should be applied.

          People

          • Assignee:
            Sreekanth Ramakrishnan
            Reporter:
            Hong Tang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development