[HBASE-1364] [performance] Distributed splitting of regionserver commit logs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.92.0
Component/s: Coprocessors
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
Adds distributed WAL log splitting in place of single-process master orchestrated splitting. Feature is ON by default (To disable, set hbase.master.distributed.log.splitting=false).

Description

~~HBASE-1008~~ has some improvements to our log splitting on regionserver crash; but it needs to run even faster.

(Below is from ~~HBASE-1008~~)

In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting.

1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error.
2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-1364.patch
19/Jan/11 18:11
88 kB
Alex Newman
1364-v5.txt
14/Apr/11 23:44
162 kB
Michael Stack
org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt
16/Apr/11 20:45
5.72 MB
Michael Stack

Issue Links

is part of

HBASE-1816 Master rewrite

Closed

is related to

HBASE-3323 OOME in master splitting logs

Closed

relates to

HBASE-1994 Master will lose hlog entries while splitting if region has empty oldlogfile.log

Closed

HBASE-3889 NPE in Distributed Log Splitting

Closed

Sub-Tasks

Allow split type (distributed,master only) to be configurable

Closed

Unassigned

100%

Breakup HLogSplitTest unit tests.

Closed

Unassigned

Add a method for creating persistent Sequential zk nodes.

Closed

Unassigned

Activity

People

Assignee:: Prakash Khemani

Reporter:: Michael Stack

Votes:: 1 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 01/May/09 05:03

Updated:: 20/Nov/15 13:01

Resolved:: 18/Apr/11 17:18

Time Tracking

Estimated:

Remaining:

Logged:

10h

Include sub-tasks