Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11240

Print hdfs pipeline when hlog's sync is slow

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0, 2.0.0
    • Component/s: Operability, wal
    • Labels:
      None
    • Release Note:
      Adds emission of a log message that lists datanodes in HDFS pipeline when sync is slow. Set when to log with "hbase.regionserver.hlog.slowsync.ms". Defaults logging if sync takes > 100ms.

      Description

      Sometimes the slow sync of hlog writer is because there is an abnormal datanode in the pipeline. So it will be helpful to print the pipeline of slow sync to diagnose those problems.

      The ultimate solution is to join the trace of HBase and HDFS.

      1. HBASE-11240-addendum.diff
        1 kB
        Liu Shaohui
      2. HBASE-11240-trunk-v1.diff
        4 kB
        Liu Shaohui
      3. HBASE-11240-trunk-v2.diff
        4 kB
        Liu Shaohui

        Issue Links

          Activity

          Hide
          smartzjp Smart Zhang added a comment -

          log,debug("begin call writer to write HLlog");
          this.syncFs.invoke(this.writer, HLog.NO_ARGS);
          log,debug("end call writer to write HLlog");
          ?

          Show
          smartzjp Smart Zhang added a comment - log,debug("begin call writer to write HLlog"); this.syncFs.invoke(this.writer, HLog.NO_ARGS); log,debug("end call writer to write HLlog"); ?
          Hide
          liushaohui Liu Shaohui added a comment -

          Smart Zhang
          Yes, print the hdfs pipeline here when the syncFs is slow.

          Show
          liushaohui Liu Shaohui added a comment - Smart Zhang Yes, print the hdfs pipeline here when the syncFs is slow.
          Hide
          liushaohui Liu Shaohui added a comment -

          Patch for trunk

          Show
          liushaohui Liu Shaohui added a comment - Patch for trunk
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12647594/HBASE-11240-trunk-v1.diff
          against trunk revision .
          ATTACHMENT ID: 12647594

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647594/HBASE-11240-trunk-v1.diff against trunk revision . ATTACHMENT ID: 12647594 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//console This message is automatically generated.
          Hide
          stack stack added a comment -

          Liu Shaohui Patch looks good to me. You fellas doing htrace? Will commit later today unless other comment/objection.

          Show
          stack stack added a comment - Liu Shaohui Patch looks good to me. You fellas doing htrace? Will commit later today unless other comment/objection.
          Hide
          stack stack added a comment -

          Or sorry, you note the long sync in trace... should it be logged too or you think that too much?

          Show
          stack stack added a comment - Or sorry, you note the long sync in trace... should it be logged too or you think that too much?
          Hide
          ndimiduk Nick Dimiduk added a comment -

          info-level logging makes sense here too.

          Show
          ndimiduk Nick Dimiduk added a comment - info-level logging makes sense here too.
          Hide
          liushaohui Liu Shaohui added a comment -

          Add the pipeline of slow sync in info log.
          Thanks the reviews of stack and Nick Dimiduk.

          What about the default slow sync threshold: 100ms?
          Is it too small? Too small threshold will bring too many info log.

          Show
          liushaohui Liu Shaohui added a comment - Add the pipeline of slow sync in info log. Thanks the reviews of stack and Nick Dimiduk . What about the default slow sync threshold: 100ms? Is it too small? Too small threshold will bring too many info log.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648133/HBASE-11240-trunk-v2.diff
          against trunk revision .
          ATTACHMENT ID: 12648133

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648133/HBASE-11240-trunk-v2.diff against trunk revision . ATTACHMENT ID: 12648133 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//console This message is automatically generated.
          Hide
          liushaohui Liu Shaohui added a comment -

          stack Nick Dimiduk
          Could you help to review the new patch? Thanks.
          It's very useful to debug the latency problem with pipeline info.

          Show
          liushaohui Liu Shaohui added a comment - stack Nick Dimiduk Could you help to review the new patch? Thanks. It's very useful to debug the latency problem with pipeline info.
          Hide
          stack stack added a comment -

          Applied to master. Enis Soztutar Ok to put this in 1.0? Operations.

          Show
          stack stack added a comment - Applied to master. Enis Soztutar Ok to put this in 1.0? Operations.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5258 (See https://builds.apache.org/job/HBase-TRUNK/5258/)
          HBASE-11240 Print hdfs pipeline when hlog's sync is slow (Liu Shaohui) (stack: rev ad78a9cfded3d65f83f4015120746e5c29436b18)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5258 (See https://builds.apache.org/job/HBase-TRUNK/5258/ ) HBASE-11240 Print hdfs pipeline when hlog's sync is slow (Liu Shaohui) (stack: rev ad78a9cfded3d65f83f4015120746e5c29436b18) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          Hide
          enis Enis Soztutar added a comment -

          This looks good for branch-1 as well. But in case of a slowdown (network, etc) this will put A LOT of entries to log. Can we safeguard it so that we do not oversaturate the log.

          Show
          enis Enis Soztutar added a comment - This looks good for branch-1 as well. But in case of a slowdown (network, etc) this will put A LOT of entries to log. Can we safeguard it so that we do not oversaturate the log.
          Hide
          enis Enis Soztutar added a comment -

          Liu Shaohui There is one more problem with the patch, that this line does ms -> nano conversion wrong. It should multiply by 1M rather than 1K.

          this.slowSyncNs =
                  1000 * conf.getInt("hbase.regionserver.hlog.slowsync.ms",
                    DEFAULT_SLOW_SYNC_TIME_MS);
          
          Show
          enis Enis Soztutar added a comment - Liu Shaohui There is one more problem with the patch, that this line does ms -> nano conversion wrong. It should multiply by 1M rather than 1K. this .slowSyncNs = 1000 * conf.getInt( "hbase.regionserver.hlog.slowsync.ms" , DEFAULT_SLOW_SYNC_TIME_MS);
          Hide
          enis Enis Soztutar added a comment -

          And

                String msg =
                    new StringBuilder().append("Slow sync cost: ")
                        .append(timeInNanos / 1000).append(" ms, current pipeline: ")
          
          Show
          enis Enis Soztutar added a comment - And String msg = new StringBuilder().append( "Slow sync cost: " ) .append(timeInNanos / 1000).append( " ms, current pipeline: " )
          Hide
          liushaohui Liu Shaohui added a comment -

          Thanks, Enis Soztutar
          Sorry about this stupid mistake.
          I will make a patch later to fix this bug.

          Show
          liushaohui Liu Shaohui added a comment - Thanks, Enis Soztutar Sorry about this stupid mistake. I will make a patch later to fix this bug.
          Hide
          liushaohui Liu Shaohui added a comment -

          addendum to fix the bug in transforming ns to ms

          Show
          liushaohui Liu Shaohui added a comment - addendum to fix the bug in transforming ns to ms
          Hide
          liushaohui Liu Shaohui added a comment -

          Enis Soztutar
          Would you help to review the addendum? Thanks

          Show
          liushaohui Liu Shaohui added a comment - Enis Soztutar Would you help to review the addendum? Thanks
          Hide
          stack stack added a comment -

          Committed to branch-1, both original patch and the addendum. Enis asked that we do something about too much log when slow. I see this happening in recent runs. Should be ameliorated by fix to timing calc but my guess is that we will still have 'too much'. Can address in a follow on patch.

          Thanks for the nice addition Liu Shaohui

          Show
          stack stack added a comment - Committed to branch-1, both original patch and the addendum. Enis asked that we do something about too much log when slow. I see this happening in recent runs. Should be ameliorated by fix to timing calc but my guess is that we will still have 'too much'. Can address in a follow on patch. Thanks for the nice addition Liu Shaohui
          Hide
          stack stack added a comment -

          Oh, I also applied the ADDENDUM to master.

          Show
          stack stack added a comment - Oh, I also applied the ADDENDUM to master.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-1.0 #43 (See https://builds.apache.org/job/HBase-1.0/43/)
          HBASE-11240 Print hdfs pipeline when hlog's sync is slow (Original patch + ADDENDUM) (stack: rev a6d271201f132b254613b1aa839a503bb61be4bc)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-1.0 #43 (See https://builds.apache.org/job/HBase-1.0/43/ ) HBASE-11240 Print hdfs pipeline when hlog's sync is slow (Original patch + ADDENDUM) (stack: rev a6d271201f132b254613b1aa839a503bb61be4bc) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          Hide
          enis Enis Soztutar added a comment -

          Thanks Liu, Stack.

          Should be ameliorated by fix to timing calc but my guess is that we will still have 'too much'. Can address in a follow on patch.

          Yes, it would be awesome if we fix this as well.

          Show
          enis Enis Soztutar added a comment - Thanks Liu, Stack. Should be ameliorated by fix to timing calc but my guess is that we will still have 'too much'. Can address in a follow on patch. Yes, it would be awesome if we fix this as well.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5305 (See https://builds.apache.org/job/HBase-TRUNK/5305/)
          HBASE-11240 Print hdfs pipeline when hlog's sync is slow ADDENDUM (stack: rev c694ec11df3437b2ad24f365e3318dc7e931ddfc)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5305 (See https://builds.apache.org/job/HBase-TRUNK/5305/ ) HBASE-11240 Print hdfs pipeline when hlog's sync is slow ADDENDUM (stack: rev c694ec11df3437b2ad24f365e3318dc7e931ddfc) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          Hide
          enis Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          enis Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          enis Enis Soztutar added a comment -

          Chia-Ping Tsai why did you assign this to yourself?

          Show
          enis Enis Soztutar added a comment - Chia-Ping Tsai why did you assign this to yourself?
          Hide
          chia7712 Chia-Ping Tsai added a comment -

          hi Enis Soztutar

          This is my fault and it is accidentally pressed.
          How can I do to restore the Assignee ?

          Thanks

          Show
          chia7712 Chia-Ping Tsai added a comment - hi Enis Soztutar This is my fault and it is accidentally pressed. How can I do to restore the Assignee ? Thanks
          Hide
          enis Enis Soztutar added a comment -

          No worries. Thanks Chia-Ping Tsai.

          Show
          enis Enis Soztutar added a comment - No worries. Thanks Chia-Ping Tsai .

            People

            • Assignee:
              liushaohui Liu Shaohui
              Reporter:
              liushaohui Liu Shaohui
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development