Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.7.4, 2.6.6
Description
HDFS-11180 fixed a general class of deadlock that can occur when failing over between the MetricsSystemImpl and FSEditLog (see comments on that JIRA for more details). In trunk and branch-2/branch-2.8 this fix was successful by making the metrics calls not synchronize on FSEditLog.
In branch-2.6 and branch-2.7 there is one more method, FSNamesystem#getTransactionsSinceLastCheckpoint, which still requires the lock on FSEditLog and thus can result in the same deadlock scenario. This can be seen by running TestFSNamesystemMBean#testWithFSEditLogLock with the patch in HDFS-11290 on either of these branches (it fails currently).
Attachments
Attachments
Issue Links
- is related to
-
HDFS-11180 Intermittent deadlock in NameNode when failover happens.
- Resolved
-
HDFS-11290 TestFSNameSystemMBean should wait until JMX cache is cleared
- Resolved