Hadoop Common
  1. Hadoop Common
  2. HADOOP-9504

MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.0.3-alpha
    • Fix Version/s: 2.1.0-beta, 0.23.8, 1.2.1
    • Component/s: metrics
    • Labels:
      None

      Description

      Please see HBASE-8416 for detail information.
      we need to take care of the synchronization for HashMap put(), otherwise it may lead to spin loop.

      1. HADOOP-9504-branch-1.txt
        1 kB
        Jason Lowe
      2. HADOOP-9504-v2.txt
        1 kB
        Liang Xie
      3. HADOOP-9504.txt
        1 kB
        Liang Xie

        Issue Links

          Activity

          Hide
          Liang Xie added a comment -

          Just one line change, no extra test case be introduced, IMHO, it's OK?

          Show
          Liang Xie added a comment - Just one line change, no extra test case be introduced, IMHO, it's OK?
          Hide
          Jason Lowe added a comment -

          Is this really the right fix? As I understand it, the problem was caused by two or more threads racing in createMBeanInfo() and that's what clobbered a HashMap causing an infinite loop on a subsequent get. I agree that making it a ConcurrentHashMap will prevent the infinite loop, but will the resulting map contain what we want? As is, it could contain a hodgepodge of puts from different threads which seems bad.

          createMBeanInfo() creates a fresh map upon each invocation, and I think the real issue is that it makes the new map available to other threads via metricsRateAttributeMod before the map has finished being initialized. I think we should build the map into a local variable then advertise it only after it has been completely filled in. That way metricsRateAttributeMod is always referring to a consistent map that isn't being actively modified. The only writes to the map are in createMBeanInfo(), and therefore we may not need a ConcurrentHashMap in that case.

          Show
          Jason Lowe added a comment - Is this really the right fix? As I understand it, the problem was caused by two or more threads racing in createMBeanInfo() and that's what clobbered a HashMap causing an infinite loop on a subsequent get. I agree that making it a ConcurrentHashMap will prevent the infinite loop, but will the resulting map contain what we want? As is, it could contain a hodgepodge of puts from different threads which seems bad. createMBeanInfo() creates a fresh map upon each invocation, and I think the real issue is that it makes the new map available to other threads via metricsRateAttributeMod before the map has finished being initialized. I think we should build the map into a local variable then advertise it only after it has been completely filled in. That way metricsRateAttributeMod is always referring to a consistent map that isn't being actively modified. The only writes to the map are in createMBeanInfo(), and therefore we may not need a ConcurrentHashMap in that case.
          Hide
          Elliott Clark added a comment -

          Any chance of getting whichever fix is best backported to branch-1 ? For HBase this can really put a regionserver into a world of hurt. And there are still a fair number of hbase users on older branches. Seems like a 1.0.5 and/or 1.1.3 release would be helpful for users.

          Show
          Elliott Clark added a comment - Any chance of getting whichever fix is best backported to branch-1 ? For HBase this can really put a regionserver into a world of hurt. And there are still a fair number of hbase users on older branches. Seems like a 1.0.5 and/or 1.1.3 release would be helpful for users.
          Hide
          Liang Xie added a comment -

          Jason Lowe, i have got your concern, yes, it's reasonable.
          let's move metricsRateAttributeMod initialization into ctor, we could ensure "metricsRateAttributeMod is always referring to a consistent map" in this way, without obvious performance penalty.

          Show
          Liang Xie added a comment - Jason Lowe , i have got your concern, yes, it's reasonable. let's move metricsRateAttributeMod initialization into ctor, we could ensure "metricsRateAttributeMod is always referring to a consistent map" in this way, without obvious performance penalty.
          Hide
          Jason Lowe added a comment -

          Thanks, Liang, but that doesn't resolve the issue. Threads can race through createMBeanInfo, and since all threads would be updating the same map, the set of puts from one thread could be partially clobbered by another racing thread resulting in an inconsistent map when all threads have left createMBeanInfo. I believe creating a separate map is the proper behavior, but a thread simply shouldn't publish the map to other threads (via the metricsRateAttributeMod field) until the map has been completely filled in by that thread.

          Show
          Jason Lowe added a comment - Thanks, Liang, but that doesn't resolve the issue. Threads can race through createMBeanInfo, and since all threads would be updating the same map, the set of puts from one thread could be partially clobbered by another racing thread resulting in an inconsistent map when all threads have left createMBeanInfo. I believe creating a separate map is the proper behavior, but a thread simply shouldn't publish the map to other threads (via the metricsRateAttributeMod field) until the map has been completely filled in by that thread.
          Hide
          Jason Lowe added a comment -

          Updating the summary to describe the issue rather than the solution.

          Show
          Jason Lowe added a comment - Updating the summary to describe the issue rather than the solution.
          Hide
          Liang Xie added a comment -

          the set of puts from one thread could be partially clobbered by another racing thread resulting in an inconsistent map when all threads have left createMBeanInfo

          IMHO, if we use ConcurrentHashMap, that will not happen, ConcurrentHashMap has a internal synchronize lock to prevent.

          Show
          Liang Xie added a comment - the set of puts from one thread could be partially clobbered by another racing thread resulting in an inconsistent map when all threads have left createMBeanInfo IMHO, if we use ConcurrentHashMap, that will not happen, ConcurrentHashMap has a internal synchronize lock to prevent.
          Hide
          Jason Lowe added a comment -

          ConcurrentHashMap doesn't prevent the kind of corruption I was referring to. Take the following scenario:

          Threads A and B both want to publish to the map. A wants to publish X:1 and Y:2 while B wants to publish X:3 and Y:4. If we create the map in the constructor and never replace it, then threads racing can end up with a map that is X:3, Y:1 which matches neither thread's view and thus is inconsistent relative to the two views the threads wanted to publish. Creating separate maps in each thread and publishing them as an entire map at once would prevent that.

          That being said, it looks like the racing threads in createMBeanInfo are trying to publish the same values to the map, so the type of corruption I was worried about should not occur. My apologies for not noticing that earlier.

          Show
          Jason Lowe added a comment - ConcurrentHashMap doesn't prevent the kind of corruption I was referring to. Take the following scenario: Threads A and B both want to publish to the map. A wants to publish X:1 and Y:2 while B wants to publish X:3 and Y:4. If we create the map in the constructor and never replace it, then threads racing can end up with a map that is X:3, Y:1 which matches neither thread's view and thus is inconsistent relative to the two views the threads wanted to publish. Creating separate maps in each thread and publishing them as an entire map at once would prevent that. That being said, it looks like the racing threads in createMBeanInfo are trying to publish the same values to the map, so the type of corruption I was worried about should not occur. My apologies for not noticing that earlier.
          Hide
          Liang Xie added a comment -

          the racing threads in createMBeanInfo are trying to publish the same values to the map

          yeh, you got it

          Show
          Liang Xie added a comment - the racing threads in createMBeanInfo are trying to publish the same values to the map yeh, you got it
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12580652/HADOOP-9504-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2484//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2484//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580652/HADOOP-9504-v2.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2484//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2484//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          +1, will commit this shortly.

          Show
          Jason Lowe added a comment - +1, will commit this shortly.
          Hide
          Jason Lowe added a comment -

          Thanks, Liang! I committed this to trunk, branch-2, and branch-0.23.

          Show
          Jason Lowe added a comment - Thanks, Liang! I committed this to trunk, branch-2, and branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3675 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3675/)
          HADOOP-9504. MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3675 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3675/ ) HADOOP-9504 . MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Hide
          Jason Lowe added a comment -

          Attaching patch for branch-1, someone have a look-see and commit if deemed OK.

          Matt Foley, is this important enough to trigger a new patch release for 1.x?

          Show
          Jason Lowe added a comment - Attaching patch for branch-1, someone have a look-see and commit if deemed OK. Matt Foley , is this important enough to trigger a new patch release for 1.x?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #196 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/196/)
          HADOOP-9504. MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #196 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/196/ ) HADOOP-9504 . MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #594 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/594/)
          svn merge -c 1476487 FIXES: HADOOP-9504. MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476489)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476489
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #594 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/594/ ) svn merge -c 1476487 FIXES: HADOOP-9504 . MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476489) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476489 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1385 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1385/)
          HADOOP-9504. MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1385 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1385/ ) HADOOP-9504 . MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1412 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1412/)
          HADOOP-9504. MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1412 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1412/ ) HADOOP-9504 . MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo. Contributed by Liang Xie (Revision 1476487) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476487 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics/util/MetricsDynamicMBeanBase.java
          Hide
          stack added a comment -

          Matt Foley To add to Jason Lowe 's petition, while this may not alone be significant enough to trigger a 1.* patch release, I would suggest it important enough to include in any coming 1.* release. We have experienced a number of production hbase installs beyond the one described in hbase-8416 where a server starts using all CPUs though effectively idle requiring ops intervention to do a node decommission and restart. Thanks.

          Show
          stack added a comment - Matt Foley To add to Jason Lowe 's petition, while this may not alone be significant enough to trigger a 1.* patch release, I would suggest it important enough to include in any coming 1.* release. We have experienced a number of production hbase installs beyond the one described in hbase-8416 where a server starts using all CPUs though effectively idle requiring ops intervention to do a node decommission and restart. Thanks.
          Hide
          Matt Foley added a comment -

          I agree. I propose to put it in 1.2.1 in about 3 weeks, if that is okay with everyone.

          Show
          Matt Foley added a comment - I agree. I propose to put it in 1.2.1 in about 3 weeks, if that is okay with everyone.
          Hide
          Jason Lowe added a comment -

          Sounds reasonable to me. If you feel the patch for branch-1 is ready to go, please commit it to the appropriate branch(es). Thanks!

          Show
          Jason Lowe added a comment - Sounds reasonable to me. If you feel the patch for branch-1 is ready to go, please commit it to the appropriate branch(es). Thanks!
          Hide
          Matt Foley added a comment - - edited

          Reopened for submission to branch-1.2, per Jason Lowe.

          Show
          Matt Foley added a comment - - edited Reopened for submission to branch-1.2, per Jason Lowe .
          Hide
          Bryan Beaudreault added a comment -

          Any update on this being merged? The impact to an hbase regionserver is pretty substantial when this happens.

          Show
          Bryan Beaudreault added a comment - Any update on this being merged? The impact to an hbase regionserver is pretty substantial when this happens.
          Hide
          stack added a comment -

          Matt Foley Would you like me commit? If so, just say. Thanks.

          Show
          stack added a comment - Matt Foley Would you like me commit? If so, just say. Thanks.
          Hide
          Matt Foley added a comment -

          Committed to branch-1 and branch-1.2. Thanks, Liang and Jason!

          Show
          Matt Foley added a comment - Committed to branch-1 and branch-1.2. Thanks, Liang and Jason!

            People

            • Assignee:
              Liang Xie
              Reporter:
              Liang Xie
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development