[HBASE-15837] Memstore size accounting is wrong if postBatchMutate() throws exception - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0, 1.2.2, 0.98.20, 1.1.6, 2.0.0
Component/s: regionserver
Labels:
None

Hadoop Flags:

Reviewed

Description

Over in ~~PHOENIX-2883~~, I've been trying to figure out how to track down the root cause of an issue we were seeing where a negative memstoreSize was ultimately causing an RS to abort. The tl;dr version is

Something causes memstoreSize to be negative (not sure what is doing this yet)
All subsequent flushes short-circuit and don't run because they think there is no data to flush
The region is eventually closed (commonly, for a move).
A final flush is attempted on each store before closing (which also short-circuit for the same reason), leaving unflushed data in each store.
The sanity check that each store's size is zero fails and the RS aborts.

I have a little patch which I think should improve our failure case around this, preventing the RS abort safely (forcing a flush when memstoreSize is negative) and logging a calltrace when an update to memstoreSize make it negative (to find culprits in the future).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-15837.001.patch
16/May/16 19:01
4 kB
Josh Elser
hbase-15837.branch-1.patch
26/May/16 00:22
5 kB
Enis Soztutar
hbase-15837-v1.patch
19/May/16 01:56
6 kB
Enis Soztutar
hbase-memstore-size-accounting.patch
17/May/16 03:41
5 kB
Enis Soztutar

Issue Links

is related to

PHOENIX-2883 Region close during automatic disabling of index for rebuilding can lead to RS abort

Resolved

Activity

People

Assignee:: Josh Elser

Reporter:: Josh Elser

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 16/May/16 18:13

Updated:: 27/May/16 17:44

Resolved:: 26/May/16 18:56