[HBASE-19468] FNFE during scans and flushes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.1
Fix Version/s: 1.3.2, 1.4.1, 2.0.0
Component/s: regionserver, Scanners
Labels:
None

Hadoop Flags:

Reviewed

Description

We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at the same time. This causes regionserver to throw a UnknownScannerException and client retries.

This happens during the following sequence:

1. Scanner open, client fetched some rows from regionserver and working on it
2. Flush happens and storeScanner is updated with flushed files (StoreScanner.updateReaders())
3. Compaction happens on the region while scanner is still open
4. compaction discharger runs and cleans up the newly flushed file as we don't have new scanners on it yet.
5. Client issues scan.next and during StoreScanner.resetScannerStack(), we get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. With branch-1.4, the scan fails with a DoNotRetryIOException.

ram_krish, My proposal is to increment the reader count during updateReaders() and decrement it during resetScannerStack(), so discharger doesn't clean it up. Scan lease expiries also have to be taken care of. Am I missing anything? Is there a better approach?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-19468_master.patch
18/Dec/17 10:17
6 kB
ramkrishna.s.vasudevan
HBASE-19468_1.4.patch
12/Dec/17 07:46
9 kB
ramkrishna.s.vasudevan
HBASE-19468-poc.patch
12/Dec/17 07:24
8 kB
Thiruvel Thirumoolan

Issue Links

relates to

HBASE-27484 FNFE on StoreFileScanner after a flush followed by a compaction

Resolved

Activity

People

Assignee:: ramkrishna.s.vasudevan

Reporter:: Thiruvel Thirumoolan

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 08/Dec/17 23:05

Updated:: 15/Nov/22 00:15

Resolved:: 20/Dec/17 06:12