[HBASE-3809] .META. may not come back online if > number of executors servers crash and one of those > number of executors was carrying meta - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

This is a duplicate of another issue but at the moment I cannot find the original.

If you had a 700 node cluster and then you ran something on the cluster which killed 100 nodes, and .META. had been running on one of those downed nodes, well, you'll have all of your master executors processing ServerShutdowns and more than likely non of the currently processing executors will be servicing the shutdown of the server that was carrying .META.

Well, for server shutdown to complete at the moment, an online .META. is required. So, in the above case, we'll be stuck. The current executors will not be able to clear to make space for the processing of the server carrying .META. because they need .META. to complete.

We can make the master handlers have no bound so it will expand to accomodate all crashed servers – so it'll have the one .META. in its queue – or we can change it so shutdown handling doesn't require .META. to be on-line (its used to figure the regions the server was carrying); we could use the master's in-memory picture of the cluster (But IIRC, there may be holes ....TBD)

Attachments

Issue Links

relates to

HBASE-5843 Improve HBase MTTR - Mean Time To Recover

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Michael Stack

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 21/Apr/11 06:44

Updated:: 12/Jun/22 18:33

Resolved:: 08/Jan/13 07:59