Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4621

additional logging to help diagnose slow QJM logSync

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.0.3-alpha
    • 2.1.0-beta
    • ha, qjm
    • None
    • Reviewed

    Description

      I've been working on diagnosing an issue with a cluster which is seeing slow logSync calls occasionally to QJM. Adding a few more pieces of logging would help this:

      • in the warning messages on the client side leading up to a timeout, include which nodes have responded and which ones are still pending
      • on the server side, when we actually call FileChannel.force, log a warning if the sync takes longer than 1 second

      Attachments

        1. hdfs-4621.txt
          5 kB
          Todd Lipcon

        Activity

          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12574740/hdfs-4621.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestLeaseRecovery2

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4131//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4131//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574740/hdfs-4621.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestLeaseRecovery2 +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4131//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4131//console This message is automatically generated.
          atm Aaron Myers added a comment -

          Patch looks good to me. I'm pretty confident the test failure is unrelated.

          My only suggestion would be to maybe make the logging thresholds configurable, or a percentage of the related timeout if there is one, but you can take it or leave it.

          +1

          atm Aaron Myers added a comment - Patch looks good to me. I'm pretty confident the test failure is unrelated. My only suggestion would be to maybe make the logging thresholds configurable, or a percentage of the related timeout if there is one, but you can take it or leave it. +1
          tlipcon Todd Lipcon added a comment -

          Thanks, I decided not to change the timeouts for warning – they're set to 1 sec, which is a long enough time that you wouldn't expect any well-configured disk to take that long to fsync. Even if the timeout is 20sec, you'd probably want to know if your IO is taking more than 10s of ms, really, so 1sec should be conservative. Committing to branch-2 and trunk.

          tlipcon Todd Lipcon added a comment - Thanks, I decided not to change the timeouts for warning – they're set to 1 sec, which is a long enough time that you wouldn't expect any well-configured disk to take that long to fsync. Even if the timeout is 20sec, you'd probably want to know if your IO is taking more than 10s of ms, really, so 1sec should be conservative. Committing to branch-2 and trunk.
          hudson Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3537 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3537/)
          HDFS-4621. Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #3537 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3537/ ) HDFS-4621 . Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #169 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/169/)
          HDFS-4621. Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment - Integrated in Hadoop-Yarn-trunk #169 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/169/ ) HDFS-4621 . Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1358 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1358/)
          HDFS-4621. Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1358 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1358/ ) HDFS-4621 . Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1386 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1386/)
          HDFS-4621. Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java
          hudson Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1386 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1386/ ) HDFS-4621 . Additional logging to help diagnose slow QJM syncs. Contributed by Todd Lipcon. (Revision 1461777) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461777 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumCall.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: