[HBASE-13877] Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.98.14, 1.0.2, 1.2.0, 1.1.1, 2.0.0
Component/s: integration tests, proc-v2
Labels:
None

Description

ITBLL with 1.25B rows failed for me (and Stack as reported in https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834)

~~HBASE-13811~~ and ~~HBASE-13853~~ fixed an issue with WAL edit filtering.

The root cause this time seems to be different. It is due to procedure based flush interrupting the flush request in case the procedure is cancelled from an exception elsewhere. This leaves the memstore snapshot intact without aborting the server. The next flush, then flushes the previous memstore with the current seqId (as opposed to seqId from the memstore snapshot). This creates an hfile with larger seqId than what its contents are. Previous behavior in 0.98 and 1.0 (I believe) is that after flush prepare and interruption / exception will cause RS abort.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hbase-13877_v5-branch-1.1.patch
14/Jun/15 06:04
29 kB
Enis Soztutar
hbase-13877_v3-branch-1.1.patch
12/Jun/15 00:16
28 kB
Enis Soztutar
hbase-13877_v2-to-v4-branch-1.1.patch
13/Jun/15 02:07
22 kB
Enis Soztutar
hbase-13877_v2-branch-1.1.patch
09/Jun/15 23:53
9 kB
Enis Soztutar
hbase-13877_v1.patch
09/Jun/15 23:08
9 kB
Enis Soztutar

Issue Links

is related to

HBASE-13811 Splitting WALs, we are filtering out too many edits -> DATALOSS

Closed

Activity

People

Assignee:: Enis Soztutar

Reporter:: Enis Soztutar

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 09/Jun/15 22:01

Updated:: 17/Dec/15 23:45

Resolved:: 17/Jun/15 19:23