HBase
  1. HBase
  2. HBASE-10370

Compaction in out-of-date Store causes region split failure

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.94.3, 0.98.0, 0.99.0
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0
    • Component/s: Compaction
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.

      2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
      java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
      at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
      at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
      at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
      ....

      The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

      The timeline is that

      Assumption: there are two hfiles: a, b in Store A in Region R
      t0: A compaction request of Store A(a+b) in Region R is sent.

      t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

      t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

      t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

      t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

      I have add a test to identity this problem.

      After search the jira, maybe HBASE-8502 is the same problem. Dimitri Goldin

      1. HBASE-10370-v1.diff
        4 kB
        Liu Shaohui
      2. HBASE-10370-v2.diff
        4 kB
        Liu Shaohui
      3. 10370-v3.patch
        4 kB
        Ted Yu
      4. 10370v2.096.txt
        1.0 kB
        stack
      5. 10370-v4.patch
        1.0 kB
        Ted Yu

        Issue Links

          Activity

          Enis Soztutar made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Dimitri Goldin made changes -
          Link This issue relates to HBASE-8502 [ HBASE-8502 ]
          Ted Yu made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Ted Yu made changes -
          Summary Compaction in out-of-date Store causes region split failed Compaction in out-of-date Store causes region split failure
          Ted Yu made changes -
          Attachment 10370-v4.patch [ 12623747 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          stack made changes -
          Fix Version/s 0.96.2 [ 12325658 ]
          stack made changes -
          Attachment 10370v2.096.txt [ 12623710 ]
          Ted Yu made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.98.0 [ 12323143 ]
          Fix Version/s 0.99.0 [ 12325675 ]
          Ted Yu made changes -
          Attachment 10370-v3.patch [ 12623704 ]
          Andrew Purtell made changes -
          Affects Version/s 0.98.0 [ 12323143 ]
          Liu Shaohui made changes -
          Attachment HBASE-10370-v2.diff [ 12623612 ]
          Liu Shaohui made changes -
          Description In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
          {quote}
          2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
          java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
                  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
          ....
          {quote}
          The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

          The timeline is that

          Assumption: there are two hfiles: a, b in Store A in Region R
          t0: A compaction request of Store A(a+b) in Region R is send.

          t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

          t2: Run the compaction send in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

          t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

          t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

          I have add a test to identity this problem.

          After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
          In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
          {quote}
          2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
          java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
                  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
          ....
          {quote}
          The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

          The timeline is that

          Assumption: there are two hfiles: a, b in Store A in Region R
          t0: A compaction request of Store A(a+b) in Region R is sent.

          t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

          t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

          t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

          t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

          I have add a test to identity this problem.

          After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
          Liu Shaohui made changes -
          Description In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
          {quote}
          2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
          java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
                  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
          ....
          {quote}
          The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

          The timeline is that

          Assumption: there are two hfiles: a, b in Store A in Region R
          t0: A compaction request of Store A(a+b) in Region R is send.

          t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

          t2: Run the compaction send in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

          t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

          t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

          I have add a test to identity this problem.

          After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
          In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
          {quote}
          2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
          java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
                  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
          ....
          {quote}
          The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

          The timeline is that

          Assumption: there are two hfiles: a, b in Store A in Region R
          t0: A compaction request of Store A(a+b) in Region R is send.

          t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

          t2: Run the compaction send in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

          t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

          t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

          I have add a test to identity this problem.

          After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
          Liang Xie made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.94.3 [ 12323144 ]
          Affects Version/s 0.99.0 [ 12325675 ]
          Liang Xie made changes -
          Assignee Liu Shaohui [ liushaohui ]
          Liu Shaohui made changes -
          Attachment HBASE-10370-v1.diff [ 12623584 ]
          Liu Shaohui made changes -
          Field Original Value New Value
          Description In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
          {quote}
          2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
          java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
                  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
          ....
          {quote}
          The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

          The timeline is that

          Assumption: there are two hfiles: a, b in Store A in Region R
          t0: A compaction request of Store A(a+b) in Region R is send.

          t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

          t2: Run compaction(a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

          t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

          t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

          I have add a test to identity this problem.

          After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
          In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.
          {quote}
          2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
          java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
                  at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
                  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
          ....
          {quote}
          The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

          The timeline is that

          Assumption: there are two hfiles: a, b in Store A in Region R
          t0: A compaction request of Store A(a+b) in Region R is send.

          t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

          t2: Run the compaction send in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

          t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

          t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

          I have add a test to identity this problem.

          After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
          Liu Shaohui created issue -

            People

            • Assignee:
              Liu Shaohui
              Reporter:
              Liu Shaohui
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development