[HBASE-19358] Improve the stability of splitting log when do fail over - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.98.24
Fix Version/s: 1.4.1, 2.0.0-beta-1, 2.0.0
Component/s: MTTR
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
After ~~HBASE-19358~~ we introduced a new property hbase.split.writer.creation.bounded to limit the opening writers for each WALSplitter. If set to true, we won't open any writer for recovered.edits until the entries accumulated in memory reaching hbase.regionserver.hlog.splitlog.buffersize (which defaults at 128M) and will write and close the file in one go instead of keeping the writer open. It's false by default and we recommend to set it to true if your cluster has a high region load (like more than 300 regions per RS), especially when you observed obvious NN/HDFS slow down during hbase (single RS or cluster) failover.

Show
After HBASE-19358 we introduced a new property hbase.split.writer.creation.bounded to limit the opening writers for each WALSplitter. If set to true, we won't open any writer for recovered.edits until the entries accumulated in memory reaching hbase.regionserver.hlog.splitlog.buffersize (which defaults at 128M) and will write and close the file in one go instead of keeping the writer open. It's false by default and we recommend to set it to true if your cluster has a high region load (like more than 300 regions per RS), especially when you observed obvious NN/HDFS slow down during hbase (single RS or cluster) failover.

Description

The way we splitting log now is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting log, which means it will create one WriterAndPath for each region and retain it until the end. If the cluster is small and the number of regions per rs is large, it will create too many HDFS streams at the same time. Then it is prone to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.

We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we will pick the largest EntryBuffer and write it to a file (close the writer after finish). Then after we read all entries into memory, we will start a writeAndCloseThreadPool, it starts a certain number of threads to write all buffers to files. Thus it will not create HDFS streams more than hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during splitting log,
it will not exceeds hbase.regionserver.wal.max.splitters * hbase.regionserver.hlog.splitlog.writer.threads, but before it is hbase.regionserver.wal.max.splitters * the number of region the hlog contains.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-19358.patch
15/Dec/17 07:49
22 kB
Jingyun Tian
HBASE-19358-v1.patch
18/Dec/17 06:28
23 kB
Jingyun Tian
HBASE-19358-v4.patch
22/Dec/17 07:12
30 kB
Jingyun Tian
HBASE-19358-v5.patch
25/Dec/17 10:41
26 kB
Jingyun Tian
HBASE-19358-v6.patch
25/Dec/17 11:39
26 kB
Jingyun Tian
HBASE-19358-v7.patch
26/Dec/17 02:37
26 kB
Jingyun Tian
HBASE-19358-branch-1.patch
29/Dec/17 09:18
25 kB
Jingyun Tian
HBASE-19358-branch-1-v2.patch
30/Dec/17 04:17
25 kB
Jingyun Tian
HBASE-19358-branch-1-v3.patch
01/Jan/18 11:17
26 kB
Jingyun Tian
HBASE-19358-v8.patch
02/Jan/18 07:49
29 kB
Jingyun Tian
HBASE-18619-branch-2-v2.patch
02/Jan/18 20:13
29 kB
Yu Li
HBASE-19358-branch-2-v3.patch
06/Jan/18 12:53
29 kB
Jingyun Tian
split_test_result.png
08/Jan/18 06:25
12 kB
Jingyun Tian
split-1-log.png
08/Jan/18 06:25
20 kB
Jingyun Tian
split-logic-new.jpg
08/Jan/18 06:25
41 kB
Jingyun Tian
split-logic-old.jpg
08/Jan/18 06:25
37 kB
Jingyun Tian
split-table.png
08/Jan/18 06:25
24 kB
Jingyun Tian

Issue Links

duplicates

HBASE-21651 when splitting hlog，occurred exception

Resolved

HBASE-18971 Limit the concurrent opened wal writers when splitting

Resolved

Sub-Tasks

Enable hbase.split.writer.creation.bounded by default

Resolved

Unassigned

Activity

People

Assignee:: Jingyun Tian

Reporter:: Jingyun Tian

Votes:: 0 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 28/Nov/17 06:08

Updated:: 01/Feb/19 20:19

Resolved:: 08/Jan/18 02:59