Liyin has commented on the revision "[jira]
HBASE-4742 Split dead server's log in parallel".
Thank Mikhail for your quick response.
We have agreed on most of the discussion here.
The remaining discussion is focusing on the number of threads launched in master for splitting dead servers log, which has made me re-considering our motivation about parallel distributed log splitting here.
Our basic motivation is splitting log should not block the region server process queue. Also the distributed log splitting itself is designed to split log for a large number region servers. So we could batch all the dead region servers together into a queue and launch single thread to do the distributed log splitting, instead of distribute log splitting for each dead server as a separate thread.
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:333 Thank you for clarify your concern
For each dead server, the master would receive the znode expire event for this dead server only once.
So the master wouldn't have 2 threads split the same dead region server at same time.
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:337 Fine. I would change it to "Succeeded in splitting".
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:347-348 Thanks for clarifying.
src/test/java/org/apache/hadoop/hbase/master/TestMultiRegionServerShutDown.java:135 Actually, I don't have to catch the exceptions here explicitly.
It won't affect the unit test results.
Thanks for the discussion.
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:312 1) We use distributed log splitting for the dead region server as well.
2) Even though we use thread pool to execute, I would bound the max thread as the number of region server.
What do you think of the max thread we should bound for the execute thread pool here?
Also as your example mentioned here, 500 region server went down. The master would launch 500 threads to distributed log splitting in parallel. It won't choke the master too much since the split job is done on each region server side.
3) But this discussion also leads us to another good point. Let's say if there are a large number region server dead for some reason. Shall we batch these dead region servers to split instead of splitting their log in parallel.
Any ideas? Mikhail and Prakash ?