Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
IIRC, this is an idea that came from the lads at Xiaomi.
I have a small cluster of 6 RSs and one went down. It had a few WALs. I see this in logs:
2013-10-09 05:47:27,890 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 25 unassigned = 21
WAL splitting is held up for want of slots out on the cluster to split WALs.
We need to be careful we don't overwhelm the foreground regionservers but more splitters should help get all back online faster.