Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4186

logSync() is called with the write lock held while releasing lease

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.4, 2.0.2-alpha
    • Fix Version/s: 3.0.0, 2.0.3-alpha, 0.23.5, 0.23.6
    • Component/s: namenode
    • Labels:
      None

      Description

      As pointed out in HDFS-4138, when the lease monitor calls internalReleaseLease(), it acquires the namespace write lock. Inside internalReleaseLease(), if a block recovery is needed, the lease is reassigned to the namenode itself and this is logged & synced in logReassignLease().

      Since this is done while the write lock is held, log syncing is blocked. When a large number of leases are expired and blocks are recovered, namenode can slow down.

        Issue Links

          Activity

          Vinayakumar B made changes -
          Link This issue relates to HDFS-4777 [ HDFS-4777 ]
          Suresh Srinivas made changes -
          Link This issue relates to HDFS-4479 [ HDFS-4479 ]
          Thomas Graves made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1259 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1259/)
          HDFS-4186. logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988)

          Result = FAILURE
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1259 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1259/ ) HDFS-4186 . logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988) Result = FAILURE daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1228 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1228/)
          HDFS-4186. logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1228 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1228/ ) HDFS-4186 . logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #437 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/437/)
          HDFS-4186. logSync() is called with the write lock held while releasing lease (Revision 1409994)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409994
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #437 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/437/ ) HDFS-4186 . logSync() is called with the write lock held while releasing lease (Revision 1409994) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409994 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #38 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/38/)
          HDFS-4186. logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #38 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/38/ ) HDFS-4186 . logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3028 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3028/)
          HDFS-4186. logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3028 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3028/ ) HDFS-4186 . logSync() is called with the write lock held while releasing lease (Kihwal Lee via daryn) (Revision 1409988) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409988 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          Daryn Sharp made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Fix Version/s 2.0.3-alpha [ 12323274 ]
          Fix Version/s 0.23.5 [ 12323313 ]
          Fix Version/s 0.23.6 [ 12323503 ]
          Resolution Fixed [ 1 ]
          Hide
          Daryn Sharp added a comment -

          I've checked this into trunk, branch 2, 23, and 23.5. Thanks Kihwal!

          Show
          Daryn Sharp added a comment - I've checked this into trunk, branch 2, 23, and 23.5. Thanks Kihwal!
          Hide
          Todd Lipcon added a comment -

          +1, looks good. Thanks guys.

          Show
          Todd Lipcon added a comment - +1, looks good. Thanks guys.
          Kihwal Lee made changes -
          Target Version/s 3.0.0, 2.0.3-alpha, 0.23.5 [ 12320356, 12323274, 12323313 ] 3.0.0, 2.0.3-alpha, 0.23.6 [ 12320356, 12323274, 12323503 ]
          Hide
          Daryn Sharp added a comment -

          Looks pretty straightforward. +1, conditional on Todd's approval of the patch.

          Show
          Daryn Sharp added a comment - Looks pretty straightforward. +1, conditional on Todd's approval of the patch.
          Hide
          Kihwal Lee added a comment -

          The test failure was not caused by the patch, but by a recent commit of HADOOP-9042. In their precommit build, only the tests in common ran, but a test in HDFS has dependency on it. Reopned HADOOP-9042 and left a comment.

          Show
          Kihwal Lee added a comment - The test failure was not caused by the patch, but by a recent commit of HADOOP-9042 . In their precommit build, only the tests in common ran, but a test in HDFS has dependency on it. Reopned HADOOP-9042 and left a comment.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12553607/hdfs-4186-trunk-skip-standbyException.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestHDFSFileSystemContract

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3512//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3512//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12553607/hdfs-4186-trunk-skip-standbyException.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHDFSFileSystemContract +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3512//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3512//console This message is automatically generated.
          Kihwal Lee made changes -
          Attachment hdfs-4186-branch-0.23-inaccurate-batched-sync-count.patch [ 12553576 ]
          Kihwal Lee made changes -
          Attachment hdfs-4186-trunk-inaccurate-batched-sync-count.patch [ 12553577 ]
          Kihwal Lee made changes -
          Attachment hdfs-4186-trunk-inaccurate-batched-sync-count.patch [ 12553585 ]
          Kihwal Lee made changes -
          Hide
          Kihwal Lee added a comment -

          branch 0.23 does not need new patch since it does not have HA.

          Show
          Kihwal Lee added a comment - branch 0.23 does not need new patch since it does not have HA.
          Hide
          Kihwal Lee added a comment -

          About the test failures, the original patch doesn't have this problem, it's hitting these after taking out the thread local var. This is because logSync() is called even in StandbyException. I will make them skip logSync() in case of StandbyException.

          Show
          Kihwal Lee added a comment - About the test failures, the original patch doesn't have this problem, it's hitting these after taking out the thread local var. This is because logSync() is called even in StandbyException. I will make them skip logSync() in case of StandbyException.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12553585/hdfs-4186-trunk-inaccurate-batched-sync-count.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3507//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3507//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12553585/hdfs-4186-trunk-inaccurate-batched-sync-count.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3507//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3507//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12553577/hdfs-4186-trunk-inaccurate-batched-sync-count.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3506//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3506//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12553577/hdfs-4186-trunk-inaccurate-batched-sync-count.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3506//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3506//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          It looks like the 23 patch is updated, but the trunk patch isn't?

          Show
          Daryn Sharp added a comment - It looks like the 23 patch is updated, but the trunk patch isn't?
          Hide
          Kihwal Lee added a comment -

          As suggested by Todd, I filed HDFS-4191 to add the per-thread transaction tracking, which is removed from the latest patch for this jira.

          Show
          Kihwal Lee added a comment - As suggested by Todd, I filed HDFS-4191 to add the per-thread transaction tracking, which is removed from the latest patch for this jira.
          Kihwal Lee made changes -
          Attachment hdfs-4186-branch-0.23-inaccurate-batched-sync-count.patch [ 12553584 ]
          Attachment hdfs-4186-trunk-inaccurate-batched-sync-count.patch [ 12553585 ]
          Hide
          Kihwal Lee added a comment -

          Updated patches address Daryn's comment.

          Show
          Kihwal Lee added a comment - Updated patches address Daryn's comment.
          Hide
          Daryn Sharp added a comment -

          I think the patch looks generally good, except when it bails out early on if (!oldest.expiredHardLimit()). I think it should return needSync instead of false? Minor nit is "happenned" is misspelled.

          Maybe I'm being too nit picky, but I'd rather see the optimization around skipping logSync when nothing has been logged separate from this bug fix for lease recovery.

          While we could break this into another jira if you feel strongly, I think the latest patch seems pretty simple in nature. I think it's esp. good in the sense of not generating bad metrics. If we are going to track a number, it should probably be an accurate number. I think that was a pretty good catch by Kihwal.

          Show
          Daryn Sharp added a comment - I think the patch looks generally good, except when it bails out early on if (!oldest.expiredHardLimit()) . I think it should return needSync instead of false? Minor nit is "happenned" is misspelled. Maybe I'm being too nit picky, but I'd rather see the optimization around skipping logSync when nothing has been logged separate from this bug fix for lease recovery. While we could break this into another jira if you feel strongly, I think the latest patch seems pretty simple in nature. I think it's esp. good in the sense of not generating bad metrics. If we are going to track a number, it should probably be an accurate number. I think that was a pretty good catch by Kihwal.
          Kihwal Lee made changes -
          Attachment hdfs-4186-branch-0.23-inaccurate-batched-sync-count.patch [ 12553576 ]
          Attachment hdfs-4186-trunk-inaccurate-batched-sync-count.patch [ 12553577 ]
          Hide
          Kihwal Lee added a comment -

          The new patches are without the extra thread local variable. The new tracking in lease monitor should make the error in batched syncs count low by calling logSync() only when necessary.

          Show
          Kihwal Lee added a comment - The new patches are without the extra thread local variable. The new tracking in lease monitor should make the error in batched syncs count low by calling logSync() only when necessary.
          Hide
          Kihwal Lee added a comment -

          If ppl don't have problem with incorrect batched sync metrics, I will take it out.

          Show
          Kihwal Lee added a comment - If ppl don't have problem with incorrect batched sync metrics, I will take it out.
          Hide
          Todd Lipcon added a comment -

          Maybe I'm being too nit picky, but I'd rather see the optimization around skipping logSync when nothing has been logged separate from this bug fix for lease recovery. There have been enough synchronization bugs around the FSEditLog code paths that I think it deserves its own JIRA (which we could cautiously put it only 3.0 at first, for example). I haven't generally seen the FSEditLog monitor be a point of contention, since it's only held for very short amounts of time, so the value of the optimization seems low compared to the added complexity.

          Show
          Todd Lipcon added a comment - Maybe I'm being too nit picky, but I'd rather see the optimization around skipping logSync when nothing has been logged separate from this bug fix for lease recovery. There have been enough synchronization bugs around the FSEditLog code paths that I think it deserves its own JIRA (which we could cautiously put it only 3.0 at first, for example). I haven't generally seen the FSEditLog monitor be a point of contention, since it's only held for very short amounts of time, so the value of the optimization seems low compared to the added complexity.
          Hide
          Kihwal Lee added a comment -

          I reran all the failed tests. They fail with HDFS-4171 and they pass without HDFS-4171.

          Show
          Kihwal Lee added a comment - I reran all the failed tests. They fail with HDFS-4171 and they pass without HDFS-4171 .
          Hide
          Kihwal Lee added a comment -

          These tests started failing since last night. I am guessing HDFS-4171.
          https://builds.apache.org/view/Hadoop/job/Hadoop-Hdfs-trunk/1226/

          Show
          Kihwal Lee added a comment - These tests started failing since last night. I am guessing HDFS-4171 . https://builds.apache.org/view/Hadoop/job/Hadoop-Hdfs-trunk/1226/
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12553453/hdfs-4186-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.web.TestWebHDFS
          org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
          org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
          org.apache.hadoop.hdfs.TestQuota
          org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs
          org.apache.hadoop.hdfs.TestDistributedFileSystem
          org.apache.hadoop.hdfs.security.TestDelegationToken
          org.apache.hadoop.fs.TestFcHdfsSymlink
          org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
          org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3500//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3500//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12553453/hdfs-4186-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.hdfs.server.namenode.TestAuditLogs org.apache.hadoop.hdfs.TestQuota org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.fs.TestFcHdfsSymlink org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3500//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3500//console This message is automatically generated.
          Kihwal Lee made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Kihwal Lee added a comment -

          It's modtly an optimization. Since several methods are now calling logSync() even if there is nothing to sync,it allows returning early before hitting the edit log lock. There is no correctness problem without it in terms of edit logging, but the metric for batched sync will be inaccurate.

          Show
          Kihwal Lee added a comment - It's modtly an optimization. Since several methods are now calling logSync() even if there is nothing to sync,it allows returning early before hitting the edit log lock. There is no correctness problem without it in terms of edit logging, but the metric for batched sync will be inaccurate.
          Hide
          Todd Lipcon added a comment -

          Hey Kihwal. Can you explain why the new ThreadLocal is necessary?

          Without the new ThreadLocal, the thread would just have a too-low txid, which would already be synced from some previous call on the same IPC handler. So the logSync would already end up a no-op, given the synced txid already will be higher than this point.

          Show
          Todd Lipcon added a comment - Hey Kihwal. Can you explain why the new ThreadLocal is necessary? Without the new ThreadLocal, the thread would just have a too-low txid, which would already be synced from some previous call on the same IPC handler. So the logSync would already end up a no-op, given the synced txid already will be higher than this point.
          Kihwal Lee made changes -
          Attachment hdfs-4186-trunk.patch [ 12553453 ]
          Hide
          Kihwal Lee added a comment -

          The patch makes logSync() deferred a bit for lease reassignment. From several methods in FSNamesapce that may cause lease reassignment are now calling logSync() in their finally block after releasing the write lock. It needs to be in a finally block, since lease recovery can throw an exception after lease reassignment.

          logSync() was modified to look at a new thread local variable for determining whether there is any logged transactions by the thread. When this variable is false, it returns right away. This variable is set whenever logEdit() is called and cleared when the log is synced. This is also unconditionally set for logSyncAll(). This does not indicate the sync state. I.e. even if it is true, another thread might have already synced all transactions by this thread.

          In the lease monitor thread, logSync() is called after checkLeases() is done and the write lock is released. All transactions logged in checkLeases() will be batched.

          Show
          Kihwal Lee added a comment - The patch makes logSync() deferred a bit for lease reassignment. From several methods in FSNamesapce that may cause lease reassignment are now calling logSync() in their finally block after releasing the write lock. It needs to be in a finally block, since lease recovery can throw an exception after lease reassignment. logSync() was modified to look at a new thread local variable for determining whether there is any logged transactions by the thread. When this variable is false, it returns right away. This variable is set whenever logEdit() is called and cleared when the log is synced. This is also unconditionally set for logSyncAll(). This does not indicate the sync state. I.e. even if it is true, another thread might have already synced all transactions by this thread. In the lease monitor thread, logSync() is called after checkLeases() is done and the write lock is released. All transactions logged in checkLeases() will be batched.
          Kihwal Lee made changes -
          Field Original Value New Value
          Assignee Kihwal Lee [ kihwal ]
          Kihwal Lee created issue -

            People

            • Assignee:
              Kihwal Lee
              Reporter:
              Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development