[HBASE-8701] distributedLogReplay need to apply wal edits in the receiving order of those edits - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.98.0, 0.99.0
Component/s: MTTR
Labels:
None

Hadoop Flags:

Reviewed

Description

This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key

The original concern situation raised from eclark:

For all edits the rowkey is the same.
There's a log with: [ A (ts = 0), B (ts = 0) ]
Replay the first half of the log.
A user puts in C (ts = 0)
Memstore has to flush
A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
Replay the rest of the Log.
Flush

The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2

Below is the option(proposed by Ted) I'd like to use:

a) During replay, we pass original wal sequence number of each edit to the receiving RS
b) In receiving RS, we store negative original sequence number of wal edits into mvcc field of KVs of wal edits
c) Add handling of negative MVCC in KVScannerComparator and KVComparator
d) In receiving RS, write original sequence number into an optional field of wal file for chained RS failure situation
e) When opening a region, we add a safety bumper(a large number) in order for the new sequence number of a newly opened region not to collide with old sequence numbers.

In the future, when we stores sequence number along with KVs, we can adjust the above solution a little bit by avoiding to overload MVCC field.

The other alternative options are listed below for references:

Option one
a) disallow writes during recovery
b) during replay, we pass original wal sequence ids
c) hold flush till all wals of a recovering region are replayed. Memstore should hold because we only recover unflushed wal edits. For edits with same key + version, whichever with larger sequence Id wins.

Option two
a) During replay, we pass original wal sequence ids
b) for each wal edit, we store each edit's original sequence id along with its key.
c) during scanning, we use the original sequence id if it's present otherwise its store file sequence Id
d) compaction can just leave put with max sequence id

Please let me know if you have better ideas.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

8701-v3.txt
17/Jun/13 13:09
9 kB
Ted Yu
hbase-8701-v4.patch
17/Jun/13 21:13
33 kB
Jeffrey Zhong
hbase-8701-v5.patch
17/Jun/13 23:21
33 kB
Jeffrey Zhong
hbase-8701-v6.patch
18/Jun/13 06:58
37 kB
Jeffrey Zhong
hbase-8701-v7.patch
19/Jun/13 04:35
42 kB
Jeffrey Zhong
hbase-8701-v8.patch
29/Jun/13 06:28
47 kB
Jeffrey Zhong
hbase-8701-tag.patch
16/Dec/13 18:20
26 kB
Jeffrey Zhong
hbase-8701-tag-v1.patch
16/Dec/13 19:32
27 kB
Jeffrey Zhong
hbase-8701-tag-v2.patch
18/Dec/13 19:28
44 kB
Jeffrey Zhong
hbase-8701-tag-v2-update.patch
19/Dec/13 20:00
44 kB
Jeffrey Zhong

Issue Links

is related to

HBASE-7006 [MTTR] Improve Region Server Recovery Time - Distributed Log Replay

Closed

Activity

People

Assignee:: Jeffrey Zhong

Reporter:: Jeffrey Zhong

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 06/Jun/13 18:38

Updated:: 21/Feb/15 23:31

Resolved:: 20/Dec/13 22:32