[HBASE-8502] Eternally stuck Region after split - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Abandoned
Affects Version/s: 0.92.1
Fix Version/s: None
Component/s: None
Labels:
None

Description

Exact HBase version: 0.92.1-cdh4.1.2

A couple of days ago I encountered a RIT problem with a single region.
After an hbck run it started trying to assign a region which has been
bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards.

This was due to a split gone wrong in some way, which led to several
reference files being left in the region-directory despite the two relevant HFiles being copies successfully to the daughter.

I will try to give as many details as possible, but unfortunately I was
unable to find any information about the split itself.

Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3C5182758B.1060306@neofonie.de%3E

===

Parent region: 5b9c16898a371de58f31f0bdf86b1f8b
Daughter region in question: 79c619508659018ff3ef0887611eb8f7

Rough sequence from the logs seems to be the following:

===

Received request to open region:
documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.

Setting up tabledescriptor config now ...

Opening of region {NAME => 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.', STARTKEY => '7128586022887322720', ENDKEY => '7130716361635801616', ENCODED => 79c619508659018ff3ef0887611eb8f7,}
failed, marking as
FAILED_OPEN in ZK

File does not exist:
/hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9
[...]
===

What happened, was that somehow (and that's the question here) the daughters
region folder contained some left-over reference files were causing the
RegionServer to look-up the parent region, which already was deleted.

original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d:
==
0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b
47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b
4f01ecd052ce464d81e79a62ea227d6b
4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b
eb7dbb09701d4353be24ca82481c4a7e
==

I attached the full FileNotFound Exception.

Please let me know if I can provide more information or help otherwise.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hbase_lost_parent.txt
08/May/13 15:00
50 kB
Dimitri Goldin
hbase_run.log
13/May/13 09:10
12 kB
Dimitri Goldin
stuck_region_exception.txt
07/May/13 15:53
9 kB
Dimitri Goldin

Issue Links

is related to

HBASE-10370 Compaction in out-of-date Store causes region split failure

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Dimitri Goldin

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/May/13 15:51

Updated:: 16/Jun/22 17:04

Resolved:: 06/Aug/13 20:57