[HBASE-16820] BulkLoad mvcc visibility only works accidentally - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.1.8
Fix Version/s: None
Component/s: None
Labels:
None

Description

sergey.soldatov has been debugging an issue with a 1.1 code base where the commit for ~~HBASE-16721~~ broke the bulk load visibility. After bulk load, the bulk load files is not visible because the sequence id assigned to the bulk load is not advanced in mvcc.

Debugging further, we have noticed that bulk load behavior is wrong, but it works "accidentally" in all code bases (but broken in 1.1 after ~~HBASE-16721~~). Let me explain:

BL request can optionally request a flush before hand (this should be the default) which causes the flush to happen with some sequenceId. The flush sequence id is one past all the cells' sequenceids. This flush sequence id is returned as a result to the flush operation.
BL then uses this particular sequenceId to mark the files, but itself does not get a new sequenceid of its own, or advance the mvcc number.
BL completes WITHOUT making sure that the sequence id is visible.
BL itself though writes entries to the WAL for the BL event, which in 1.2 code bases goes through the whole mvcc + seqId paths, which makes sure that earlier sequenceIds (the flush sequenceId) are visible via mvcc.

The problem with 1.1 is that the WAL entries only get sequence ids, but do not touch mvcc. With the patch for ~~HBASE-16721~~, we have made it so that the flushedSequenceId is not used in mvcc as the highest read point (although all the data is still visible).

BL relying on the flush sequence id is wrong for two reasons:

BL files are loaded with the flush sequence id from the memstore. This particular sequence id is used twice for two different things and ends up being the sequence id for flushed file as well as BL'ed files.
BL should make sure that it gets a new sequence id and that sequence id is visible before returning the results.

ndimiduk FYI.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-16820-branch-1.1-v0.patch
13/Oct/16 15:28
1 kB
Allan Yang

Issue Links

is related to

HBASE-16721 Concurrency issue in WAL unflushed seqId tracking

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Enis Soztutar

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 13/Oct/16 00:08

Updated:: 11/Jun/22 18:57

Resolved:: 11/Jun/22 18:57