In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.
The timeline is that
Assumption: there are two hfiles: a, b in Store A in Region R
t0: A compaction request of Store A(a+b) in Region R is sent.
t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).
t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.
t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)
t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.
I have add a test to identity this problem.