Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
Description
As part of RS start up, RS reports for duty to Master . Master acknowledges the request and adds it to the onlineServers list for further assigning any regions to the RS
Once Master acknowledges the reportForDuty and sends back the response, RS does a bunch of stuff like initializing replication sources etc before becoming online. However, sometimes there could be an issue with initializing replication sources when it is unable to connect to peer clusters because of some kerberos configuration and there would be a delay of around 20 mins in becoming online.
Since master considers it online, it tries to assign regions and which fails with ServerNotRunningYet exception, then the master tries to unassign which again fails with the same exception leading the region to FAILED_CLOSE state.
It would be good to have a check to see if the RS is ready to accept the assignment requests before adding it to online servers list which would account for any such delays as described above
Attachments
Issue Links
- breaks
-
HBASE-25774 ServerManager.getOnlineServer may miss some region servers when refreshing state in some procedure implementations
- Resolved
-
HBASE-25897 TestRetainAssignmentOnRestart is flaky after HBASE-25032
- Resolved
- is related to
-
HBASE-25774 ServerManager.getOnlineServer may miss some region servers when refreshing state in some procedure implementations
- Resolved
- links to