Details
-
Type:
Improvement
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Operability, wal
-
Labels:None
-
Release Note:Adds emission of a log message that lists datanodes in HDFS pipeline when sync is slow. Set when to log with "hbase.regionserver.hlog.slowsync.ms". Defaults logging if sync takes > 100ms.
Description
Sometimes the slow sync of hlog writer is because there is an abnormal datanode in the pipeline. So it will be helpful to print the pipeline of slow sync to diagnose those problems.
The ultimate solution is to join the trace of HBase and HDFS.
-
- HBASE-11240-addendum.diff
- 1 kB
- Liu Shaohui
-
- HBASE-11240-trunk-v1.diff
- 4 kB
- Liu Shaohui
-
- HBASE-11240-trunk-v2.diff
- 4 kB
- Liu Shaohui
Issue Links
- relates to
-
HBASE-13000
Backport HBASE-11240 (Print hdfs pipeline when hlog's sync is slow) to 0.98
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Smart Zhang
Yes, print the hdfs pipeline here when the syncFs is slow.
Patch for trunk
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12647594/HBASE-11240-trunk-v1.diff
against trunk revision .
ATTACHMENT ID: 12647594
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9650//console
This message is automatically generated.
Liu Shaohui Patch looks good to me. You fellas doing htrace? Will commit later today unless other comment/objection.
Or sorry, you note the long sync in trace... should it be logged too or you think that too much?
info-level logging makes sense here too.
Add the pipeline of slow sync in info log.
Thanks the reviews of stack and Nick Dimiduk.
What about the default slow sync threshold: 100ms?
Is it too small? Too small threshold will bring too many info log.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12648133/HBASE-11240-trunk-v2.diff
against trunk revision .
ATTACHMENT ID: 12648133
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9676//console
This message is automatically generated.
stack Nick Dimiduk
Could you help to review the new patch? Thanks.
It's very useful to debug the latency problem with pipeline info.
Applied to master. Enis Soztutar Ok to put this in 1.0? Operations.
FAILURE: Integrated in HBase-TRUNK #5258 (See https://builds.apache.org/job/HBase-TRUNK/5258/)
HBASE-11240 Print hdfs pipeline when hlog's sync is slow (Liu Shaohui) (stack: rev ad78a9cfded3d65f83f4015120746e5c29436b18)
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
This looks good for branch-1 as well. But in case of a slowdown (network, etc) this will put A LOT of entries to log. Can we safeguard it so that we do not oversaturate the log.
Liu Shaohui There is one more problem with the patch, that this line does ms -> nano conversion wrong. It should multiply by 1M rather than 1K.
this.slowSyncNs = 1000 * conf.getInt("hbase.regionserver.hlog.slowsync.ms", DEFAULT_SLOW_SYNC_TIME_MS);
And
String msg = new StringBuilder().append("Slow sync cost: ") .append(timeInNanos / 1000).append(" ms, current pipeline: ")
Thanks, Enis Soztutar
Sorry about this stupid mistake.
I will make a patch later to fix this bug.
addendum to fix the bug in transforming ns to ms
Enis Soztutar
Would you help to review the addendum? Thanks
Committed to branch-1, both original patch and the addendum. Enis asked that we do something about too much log when slow. I see this happening in recent runs. Should be ameliorated by fix to timing calc but my guess is that we will still have 'too much'. Can address in a follow on patch.
Thanks for the nice addition Liu Shaohui
FAILURE: Integrated in HBase-1.0 #43 (See https://builds.apache.org/job/HBase-1.0/43/)
HBASE-11240 Print hdfs pipeline when hlog's sync is slow (Original patch + ADDENDUM) (stack: rev a6d271201f132b254613b1aa839a503bb61be4bc)
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
Thanks Liu, Stack.
Should be ameliorated by fix to timing calc but my guess is that we will still have 'too much'. Can address in a follow on patch.
Yes, it would be awesome if we fix this as well.
FAILURE: Integrated in HBase-TRUNK #5305 (See https://builds.apache.org/job/HBase-TRUNK/5305/)
HBASE-11240 Print hdfs pipeline when hlog's sync is slow ADDENDUM (stack: rev c694ec11df3437b2ad24f365e3318dc7e931ddfc)
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
Closing this issue after 0.99.0 release.
Chia-Ping Tsai why did you assign this to yourself?
This is my fault and it is accidentally pressed.
How can I do to restore the Assignee ?
Thanks
No worries. Thanks Chia-Ping Tsai.
log,debug("begin call writer to write HLlog");
this.syncFs.invoke(this.writer, HLog.NO_ARGS);
log,debug("end call writer to write HLlog");
?