[HBASE-13732] TestHBaseFsck#testParallelWithRetriesHbck fails intermittently - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.1.0, 1.2.0, 2.0.0
Fix Version/s: 1.2.0, 1.1.1, 2.0.0
Component/s: hbck, test
Labels:
None

Description

TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially in Windows environment) with "java.io.IOException: Duplicate hbck - Abort"

java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck - Abort
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
	at org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644)
Caused by: java.io.IOException: Duplicate hbck - Abort
	at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484)
	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53)
	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43)
	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38)
	at org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635)
	at org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

~~HBASE-13591~~ tried to address this issue. It did improve the pass rate in Linux environment (after the fix, I could not repro in my machine); however, the test still failed intermittently in Windows environment during testing of 1.1 release.

Looking at the code, it uses the ExponentialBackoffPolicy (starting with 200ms sleep time after first failed attempt to acquire the lock in ZK, then 400ms, then 800ms, etc.) in between retries. Therefore, even the first hbck run completes, the second hbck run would still fail due to long sleep time.

the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit and cap the max sleep time to some small number (eg. 5 seconds, it should be configurable). This would make the test more robust.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-13732-addendum-master.patch
02/Jun/15 23:35
2 kB
Stephen Yuan Jiang
HBASE-13732-addendum-branch-1.patch
02/Jun/15 23:35
3 kB
Stephen Yuan Jiang
HBASE-13732.patch
21/May/15 00:15
8 kB
Stephen Yuan Jiang

Issue Links

is duplicated by

HBASE-13574 Broken TestHBaseFsck in master with hadoop 2.6.0

Closed

is related to

HBASE-13574 Broken TestHBaseFsck in master with hadoop 2.6.0

Closed

Activity

People

Assignee:: Stephen Yuan Jiang

Reporter:: Stephen Yuan Jiang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 21/May/15 00:12

Updated:: 24/Jun/22 18:58

Resolved:: 28/May/15 01:29