Affects Version/s: 0.92.1
Fix Version/s: None
Exact HBase version: 0.92.1-cdh4.1.2
A couple of days ago I encountered a RIT problem with a single region.
After an hbck run it started trying to assign a region which has been
bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards.
This was due to a split gone wrong in some way, which led to several
reference files being left in the region-directory despite the two relevant HFiles being copies successfully to the daughter.
I will try to give as many details as possible, but unfortunately I was
unable to find any information about the split itself.
Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3C5182758B.email@example.com%3E
Parent region: 5b9c16898a371de58f31f0bdf86b1f8b
Daughter region in question: 79c619508659018ff3ef0887611eb8f7
Rough sequence from the logs seems to be the following:
- Received request to open region:
- Setting up tabledescriptor config now ...
- Opening of region
STARTKEY => '7128586022887322720',
ENDKEY => '7130716361635801616',
ENCODED => 79c619508659018ff3ef0887611eb8f7,}
failed, marking as
FAILED_OPEN in ZK
- File does not exist:
What happened, was that somehow (and that's the question here) the daughters
region folder contained some left-over reference files were causing the
RegionServer to look-up the parent region, which already was deleted.
original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d:
I attached the full FileNotFound Exception.
Please let me know if I can provide more information or help otherwise.