[HBASE-27169] TestSeparateClientZKCluster is flaky - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 3.0.0-alpha-4, 2.4.14
Component/s: test
Labels:
None

Hadoop Flags:

Reviewed

Description

https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt

org.apache.hadoop.hbase.exceptions.MasterStoppedException: null
	at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177) ~[classes/:?]
	at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954) ~[classes/:?]
	at org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743) ~[classes/:?]
	at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) ~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT]
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385) ~[classes/:?]
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[classes/:?]
	at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104) ~[classes/:?]
	at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84) ~[classes/:?]

I think the problem is that, MasterStoppedException is a sub class of DoNotRetryIOException, so when hitting this issue, we will fail immediately.

And the client zk syncer is asynchoronous, so it is possible that when we call admin.balance, we haven't synced the new location yet, and it will throw the MasterStoppedException out soon and fail the UT.

Let me see how to fix it.

Attachments

Issue Links

links to

GitHub Pull Request #4587

Activity

People

Assignee:: Duo Zhang

Reporter:: Duo Zhang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Jun/22 15:03

Updated:: 08/Jul/22 06:50

Resolved:: 08/Jul/22 06:50