[HBASE-7006] [MTTR] Improve Region Server Recovery Time - Distributed Log Replay - ASF JIRA

Details

Type: New Feature
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.98.0, 0.95.1
Component/s: MTTR
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Distributed Log Replay Description:

After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.

The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.

The advantages over existing log splitting recovered edits implementation:
1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.

The feature can be enabled by setting hbase.master.distributed.log.replay to true (by default is false)

Show
Distributed Log Replay Description: After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge. The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files. The advantages over existing log splitting recovered edits implementation: 1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance. 2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again. The feature can be enabled by setting hbase.master.distributed.log.replay to true (by default is false)

Description

Just saw interesting issue where a cluster went down hard and 30 nodes had 1700 WALs to replay. Replay took almost an hour. It looks like it could run faster that much of the time is spent zk'ing and nn'ing.

Putting in 0.96 so it gets a look at least. Can always punt.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LogSplitting Comparison.pdf
11/Feb/13 22:25
50 kB
Jeffrey Zhong
ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
18/Apr/13 01:40
130 kB
Jeffrey Zhong
hbase-7006-combined.patch
19/Apr/13 06:48
234 kB
Jeffrey Zhong
hbase-7006-combined-v1.patch
23/Apr/13 22:14
246 kB
Jeffrey Zhong
hbase-7006-combined-v4.patch
04/May/13 00:28
298 kB
Jeffrey Zhong
hbase-7006-combined-v5.patch
08/May/13 06:38
307 kB
Jeffrey Zhong
hbase-7006-combined-v6.patch
10/May/13 01:13
315 kB
Jeffrey Zhong
hbase-7006-combined-v7.patch
10/May/13 18:41
315 kB
Jeffrey Zhong
hbase-7006-combined-v8.patch
10/May/13 23:53
311 kB
Jeffrey Zhong
hbase-7006-combined-v9.patch
15/May/13 02:20
312 kB
Jeffrey Zhong
hbase-7006-addendum.patch
15/May/13 18:36
1.0 kB
Jeffrey Zhong
7006-addendum-3.txt
17/May/13 21:17
2 kB
Ted Yu

Issue Links

depends upon

HBASE-14028 DistributedLogReplay drops edits when ITBLL 125M

Closed

is related to

HBASE-8560 TestMasterShutdown failing in trunk 0.95/trunk -- "Unable to get data of znode /hbase/meta-region-server because node does not exist (not an error)"

Closed

HBASE-8567 TestDistributedLogSplitting#testLogReplayForDisablingTable fails on hadoop 2.0

Closed

HBASE-7825 Retire non distributed log splitting related code

Closed

is required by

HBASE-5843 Improve HBase MTTR - Mean Time To Recover

Closed

relates to

HBASE-11280 Document distributed log replay and distributed log splitting

Closed

HBASE-8701 distributedLogReplay need to apply wal edits in the receiving order of those edits

Closed

HBASE-8729 distributedLogReplay may hang during chained region server failure

Closed

HBASE-8568 Test case TestDistributedLogSplitting#testWorkerAbort failed intermittently

Closed

HBASE-8573 Store last flushed sequence id for each store of region for Distributed Log Replay

Closed

HBASE-8617 Introducing a new config to disable writes during recovering

Closed

HBASE-8575 TestDistributedLogSplitting#testMarkRegionsRecoveringInZK fails intermittently due to lack of online region

Closed

supercedes

HBASE-6984 Serve writes during log split

Closed

(7 relates to, 1 supercedes)

Sub-Tasks

1.	Test for split edit file ordering using failover scenario	Closed	Unassigned
2.	Implementation of the log splitting without creating intermediate files	Closed	Jeffrey Zhong
3.	Create a new "replay" command so that recovered edits won't mess up normal coprocessing & metrics	Closed	Jeffrey Zhong
4.	Add new metrics to better monitor recovery process	Closed	Jeffrey Zhong
5.	Introducing "recovering" region state in AM to mark a region in recovering status used in distributedLogReplay	Closed	Unassigned

[MTTR] Improve Region Server Recovery Time - Distributed Log Replay

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates