Hadoop Common
  1. Hadoop Common
  2. HADOOP-6508

Incorrect values for metrics with CompositeContext

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.23.0
    • Component/s: metrics
    • Labels:
      None

      Description

      In our clusters, when we use CompositeContext with two contexts, second context gets wrong values.
      This problem is consistent on 500 (and above) node cluster.

      1. CompositeContext.png
        125 kB
        Amareshwari Sriramadasu
      2. CompositeContext-solution.png
        91 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Allen Wittenauer added a comment -

          Should this really be resolved/fixed or is this JIRA in some other state?

          Show
          Allen Wittenauer added a comment - Should this really be resolved/fixed or is this JIRA in some other state?
          Hide
          Luke Lu added a comment -

          The metrics2 design ensures all the backends see the same metrics for a given snapshot. We tested and deployed the parallel backends with metrics2 in qa and production clusters for more than 8 months. No related issues are found so far.

          Show
          Luke Lu added a comment - The metrics2 design ensures all the backends see the same metrics for a given snapshot. We tested and deployed the parallel backends with metrics2 in qa and production clusters for more than 8 months. No related issues are found so far.
          Hide
          Amareshwari Sriramadasu added a comment -

          One solution we thought of is to make CompositeContext a true middle man. CompositeContext registers JobTracker as the updater. Its sub-contexts register CompositeContext as the updater. CompositeContext's MetricRecord is any other record which is updated by JobTracker, and it updates its sub-contexts on the call to doUpdates(). Here, CompositeContext takes care of synchronization for different threads accessing its record.

          See the attached png for the updated relation between JobTracker and CompositeContext. JobTracker (JT) has CompositeContext(CC) and CompositeRecord (CR). CC has a monitoring thread which calls doUpdates periodically, updates CR. CC also starts Context1(C1) and Context2(C2) 's timer threads. C1 has MetricsRecord (R1) and C2 has MetricsRecord(R2). CC updates R1 or R2 from the value of CR, if the doUpdates call is from C1 or C2, respectively. Here, CC registers JT as the updater, C1 and C2 register CC as the updater.

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - One solution we thought of is to make CompositeContext a true middle man. CompositeContext registers JobTracker as the updater. Its sub-contexts register CompositeContext as the updater. CompositeContext's MetricRecord is any other record which is updated by JobTracker, and it updates its sub-contexts on the call to doUpdates(). Here, CompositeContext takes care of synchronization for different threads accessing its record. See the attached png for the updated relation between JobTracker and CompositeContext. JobTracker (JT) has CompositeContext(CC) and CompositeRecord (CR). CC has a monitoring thread which calls doUpdates periodically, updates CR. CC also starts Context1(C1) and Context2(C2) 's timer threads. C1 has MetricsRecord (R1) and C2 has MetricsRecord(R2). CC updates R1 or R2 from the value of CR, if the doUpdates call is from C1 or C2, respectively. Here, CC registers JT as the updater, C1 and C2 register CC as the updater. Thoughts?
          Hide
          Amareshwari Sriramadasu added a comment -

          Analyzing the following JobTrackerMetricsInst code with CompositeContext as the MetricsContext:

              MetricsContext context = MetricsUtil.getContext("mapred");
              metricsRecord = MetricsUtil.createRecord(context, "jobtracker");
              metricsRecord.setTag("sessionId", sessionId);
              context.registerUpdater(this);
          

          Details on each line of code:

              MetricsContext context = MetricsUtil.getContext("mapred");
          

          This code creates a CompositeContext(CC), which creates all its sub-contexts and calls startsMonitoring on all the
          subcontext. Thus there are as many threads(monitoring) as the number of sub-contexts. Here, each thread calls
          doUpdates() followed by emitRecords() in the configured periods.

              metricsRecord = MetricsUtil.createRecord(context, "jobtracker");
          

          This code creates a MetricsRecord for CompositeContext, which is a Proxy which has a delegator for all the sub-records. This record invokes
          every method call on all its sub-records.

              context.registerUpdater(this);
          

          This code registers JobTracker as the updater for all the sub-contexts.

          Putting above relation pictorially (see the attached png): JobTracker (JT) has CompositeContext(CC) and ProxyMetricsRecord (PR). CC starts
          Context1(C1) and Context2(C2) 's timer threads. C1 has MetricsRecord (R1) and C2 has MetricsRecord(R2). PR delagates
          all method calls on it to R1 and R2. C1and C2 register JobTracker as the updater.

          Both C1 and C2 call JT.doUpdates at specified periods. The code flow for doUpdates from C1 or C2:
          1. Set/Incr methods on ProxyRecord are delegated to both R1 and R2. On the JobTracker code these calls are
          synchronized.
          2. MetricsRecord.update() boils down to R1.update() and R2.update() irrespective of whether it is from C1 or C2. This
          call is not synchronized on JT.

          Moreover, MetricsRecord javadoc clearly says:
          <em>Different threads should not use the same MetricsRecord instance at the same time. </em>. So, the problem here is that the ProxyRecord is shared between two threads without synchronization. There is possibility for race if the above updates are not synchronized. I think this could be the most likely cause for seeing incorrect values with CompositeContext.

          Show
          Amareshwari Sriramadasu added a comment - Analyzing the following JobTrackerMetricsInst code with CompositeContext as the MetricsContext: MetricsContext context = MetricsUtil.getContext( "mapred" ); metricsRecord = MetricsUtil.createRecord(context, "jobtracker" ); metricsRecord.setTag( "sessionId" , sessionId); context.registerUpdater( this ); Details on each line of code: MetricsContext context = MetricsUtil.getContext( "mapred" ); This code creates a CompositeContext(CC), which creates all its sub-contexts and calls startsMonitoring on all the subcontext. Thus there are as many threads(monitoring) as the number of sub-contexts. Here, each thread calls doUpdates() followed by emitRecords() in the configured periods. metricsRecord = MetricsUtil.createRecord(context, "jobtracker" ); This code creates a MetricsRecord for CompositeContext, which is a Proxy which has a delegator for all the sub-records. This record invokes every method call on all its sub-records. context.registerUpdater( this ); This code registers JobTracker as the updater for all the sub-contexts. Putting above relation pictorially (see the attached png): JobTracker (JT) has CompositeContext(CC) and ProxyMetricsRecord (PR). CC starts Context1(C1) and Context2(C2) 's timer threads. C1 has MetricsRecord (R1) and C2 has MetricsRecord(R2). PR delagates all method calls on it to R1 and R2. C1and C2 register JobTracker as the updater. Both C1 and C2 call JT.doUpdates at specified periods. The code flow for doUpdates from C1 or C2: 1. Set/Incr methods on ProxyRecord are delegated to both R1 and R2. On the JobTracker code these calls are synchronized. 2. MetricsRecord.update() boils down to R1.update() and R2.update() irrespective of whether it is from C1 or C2. This call is not synchronized on JT. Moreover, MetricsRecord javadoc clearly says: <em>Different threads should not use the same MetricsRecord instance at the same time. </em>. So, the problem here is that the ProxyRecord is shared between two threads without synchronization. There is possibility for race if the above updates are not synchronized. I think this could be the most likely cause for seeing incorrect values with CompositeContext.
          Hide
          Amareshwari Sriramadasu added a comment -

          Some simple experiments we have done:

          1. We brought up 500 node cluster with CompositeContext containing Contexts C1 and C2. The metric for number of trackers is correct in C1( same number as in web UI) and is wrong in C2.
          2. We brought up 500 node cluster with CompositeContext containing Contexts C2 and C1 (Interchanged the order of contexts from the earlier). Then, the metric for number of trackers is correct in C2( same number as in web UI) and is wrong in C1.

          Here, the code path, "in which the metric for number of trackers is incremented", is JobTracker.addNewTracker() whenever a new tracker is added. Since first context's value matches with the one on web UI. There is no bug in JobTracker updating code. Also there is no bug in individual context implementations, because it is always second context showing wrong values.
          Thus, this leaves there is bug in metrics framework i.e. CompositeContext.

          Show
          Amareshwari Sriramadasu added a comment - Some simple experiments we have done: 1. We brought up 500 node cluster with CompositeContext containing Contexts C1 and C2. The metric for number of trackers is correct in C1( same number as in web UI) and is wrong in C2. 2. We brought up 500 node cluster with CompositeContext containing Contexts C2 and C1 (Interchanged the order of contexts from the earlier). Then, the metric for number of trackers is correct in C2( same number as in web UI) and is wrong in C1. Here, the code path, "in which the metric for number of trackers is incremented", is JobTracker.addNewTracker() whenever a new tracker is added. Since first context's value matches with the one on web UI. There is no bug in JobTracker updating code. Also there is no bug in individual context implementations, because it is always second context showing wrong values. Thus, this leaves there is bug in metrics framework i.e. CompositeContext.

            People

            • Assignee:
              Luke Lu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development