HBase
  1. HBase
  2. HBASE-2964

Deadlock when RS tries to RPC to itself inside SplitTransaction

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.90.0
    • Fix Version/s: 0.90.0
    • Component/s: IPC/RPC, regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation:

      • All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread.
      • CompactSplitThread is in the process of trying to edit META to create the offline parent. META happens to be on the same server as is executing the split.

      Therefore, the CompactSplitThread is trying to connect back to itself, but all of the handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked.

      1. hbase-2964.txt
        2 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Fixing this is a little tricky. We could short-circuit the IPC path when detecting that a region is hosted in the same process, and thus avoid going through handlers (this is what the datanode does in the block recovery code). However, you still can have a situation where two regionservers are trying to talk to each other and end up in a deadlock.

          Another option is to add a timeout to these RPCs, abort the split and try again later if it fails.

          Another thing that might help is to have the start of the split transaction flag the table as "going offline", and before taking the readlock, other accessors of the table can check for this case and immediately throw NSRE rather than blocking once the split is in progress.

          Show
          Todd Lipcon added a comment - Fixing this is a little tricky. We could short-circuit the IPC path when detecting that a region is hosted in the same process, and thus avoid going through handlers (this is what the datanode does in the block recovery code). However, you still can have a situation where two regionservers are trying to talk to each other and end up in a deadlock. Another option is to add a timeout to these RPCs, abort the split and try again later if it fails. Another thing that might help is to have the start of the split transaction flag the table as "going offline", and before taking the readlock, other accessors of the table can check for this case and immediately throw NSRE rather than blocking once the split is in progress.
          Hide
          Todd Lipcon added a comment -

          HBASE-2782 is another solution - if we provide QoS for the META and ROOT tables so they always have a few reserved handlers in a separate thread pool, we can avoid this issue.

          Show
          Todd Lipcon added a comment - HBASE-2782 is another solution - if we provide QoS for the META and ROOT tables so they always have a few reserved handlers in a separate thread pool, we can avoid this issue.
          Hide
          stack added a comment -

          I agree this a blocker on 0.90.x

          Show
          stack added a comment - I agree this a blocker on 0.90.x
          Hide
          Todd Lipcon added a comment -

          As noted on the list, this seems to be due to HBASE-2461.

          Prior to 2461, when we split, we would close the region before doing any of the writes to META, and didn't hold any locks while doing the META updates. Now we keep the write lock all the way through, even after closing the region.

          I think simply moving the writeLock().unlock() up after the this.parent.close(false) in SplitTransaction should fix this issue. I'm testing that change on my test cluster now.

          Show
          Todd Lipcon added a comment - As noted on the list, this seems to be due to HBASE-2461 . Prior to 2461, when we split, we would close the region before doing any of the writes to META, and didn't hold any locks while doing the META updates. Now we keep the write lock all the way through, even after closing the region. I think simply moving the writeLock().unlock() up after the this.parent.close(false) in SplitTransaction should fix this issue. I'm testing that change on my test cluster now.
          Hide
          Todd Lipcon added a comment -

          I also had to move the "new HTable" call outside of the lock, since the HTable constructor does an RPC.

          This patch seems to fix the issue for me. Running an overnight load test - if it's still going in the morning I'd say we're good

          Show
          Todd Lipcon added a comment - I also had to move the "new HTable" call outside of the lock, since the HTable constructor does an RPC. This patch seems to fix the issue for me. Running an overnight load test - if it's still going in the morning I'd say we're good
          Hide
          Todd Lipcon added a comment -

          Overnight test completed OK with that patch. I think we should rebuild the rc with this if Stack thinks it looks good.

          Show
          Todd Lipcon added a comment - Overnight test completed OK with that patch. I think we should rebuild the rc with this if Stack thinks it looks good.
          Hide
          HBase Review Board added a comment -

          Message from: "Todd Lipcon" <todd@cloudera.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/798/
          -----------------------------------------------------------

          Review request for hbase and stack.

          Summary
          -------

          Moves all RPCs outside of the region writeLock - the writeLock is now only used long enough to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon acquiring the lock, and thus throw NSRE.

          In the case that we abort the split, it will reopen the region as before. Accessors will have gotten NSRE but will just come back to the same region eventually.

          This addresses bug HBASE-2964.
          http://issues.apache.org/jira/browse/HBASE-2964

          Diffs


          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d

          Diff: http://review.cloudera.org/r/798/diff

          Testing
          -------

          YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran a 5 hour load test overnight and it worked OK.

          Thanks,

          Todd

          Show
          HBase Review Board added a comment - Message from: "Todd Lipcon" <todd@cloudera.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/ ----------------------------------------------------------- Review request for hbase and stack. Summary ------- Moves all RPCs outside of the region writeLock - the writeLock is now only used long enough to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon acquiring the lock, and thus throw NSRE. In the case that we abort the split, it will reopen the region as before. Accessors will have gotten NSRE but will just come back to the same region eventually. This addresses bug HBASE-2964 . http://issues.apache.org/jira/browse/HBASE-2964 Diffs src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d Diff: http://review.cloudera.org/r/798/diff Testing ------- YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran a 5 hour load test overnight and it worked OK. Thanks, Todd
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/798/#review1110
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/798/#comment3770>

          Let me make a version of this patch that takes care of rollback – currently rollback expects the lock to be held on entrance; this will not be the case post close if above applied.

          • stack
          Show
          HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/#review1110 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/798/#comment3770 > Let me make a version of this patch that takes care of rollback – currently rollback expects the lock to be held on entrance; this will not be the case post close if above applied. stack
          Hide
          stack added a comment -

          Hmmm... now I'm thinking instead that we punt locking up here in splittransaction completely. The core issue comes of an incorrect mapping of old splitLock on to new region 'lock'. Looking at what was done under the old splitLock, it all looks safe in the face of concurrency. Down in the region close, its already taking out the region write lock. Let me make a different kinda patch.

          Show
          stack added a comment - Hmmm... now I'm thinking instead that we punt locking up here in splittransaction completely. The core issue comes of an incorrect mapping of old splitLock on to new region 'lock'. Looking at what was done under the old splitLock, it all looks safe in the face of concurrency. Down in the region close, its already taking out the region write lock. Let me make a different kinda patch.
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/798/
          -----------------------------------------------------------

          (Updated 2010-09-07 13:38:39.968517)

          Review request for hbase and stack.

          Changes
          -------

          This version removes from SplitTransaction the setting of the this.parent.lock completely. Its not needed. Down in the parent close, it takes out the write lock.

          In the past, we had a split lock and a close lock (splitLock and splitsAndClosesLock). The split lock was held across the split while daughter regions were calculated and during close, actual split and update of .META. As part of lock pruning, an error made in hbase-2641, was using splitsAndClosesLock where splitLock was used previously – and even expanding the scope of what splitLock used cover).

          Looking, splitLock looks like it could have served some purpose preventing two threads contending over splitting (splits make objects in filesystem and move stuff around), but we don't really need this in current HBase since only CompactSplitThread runs splits – even in new master regime where client can call a splitRegion. Later when we want to run multiple concurrent split transactions, we'll need to reexamine.

          Summary
          -------

          Moves all RPCs outside of the region writeLock - the writeLock is now only used long enough to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon acquiring the lock, and thus throw NSRE.

          In the case that we abort the split, it will reopen the region as before. Accessors will have gotten NSRE but will just come back to the same region eventually.

          This addresses bug HBASE-2964.
          http://issues.apache.org/jira/browse/HBASE-2964

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java a692125
          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java a245d97

          Diff: http://review.cloudera.org/r/798/diff

          Testing
          -------

          YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran a 5 hour load test overnight and it worked OK.

          Thanks,

          Todd

          Show
          HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/ ----------------------------------------------------------- (Updated 2010-09-07 13:38:39.968517) Review request for hbase and stack. Changes ------- This version removes from SplitTransaction the setting of the this.parent.lock completely. Its not needed. Down in the parent close, it takes out the write lock. In the past, we had a split lock and a close lock (splitLock and splitsAndClosesLock). The split lock was held across the split while daughter regions were calculated and during close, actual split and update of .META. As part of lock pruning, an error made in hbase-2641, was using splitsAndClosesLock where splitLock was used previously – and even expanding the scope of what splitLock used cover). Looking, splitLock looks like it could have served some purpose preventing two threads contending over splitting (splits make objects in filesystem and move stuff around), but we don't really need this in current HBase since only CompactSplitThread runs splits – even in new master regime where client can call a splitRegion. Later when we want to run multiple concurrent split transactions, we'll need to reexamine. Summary ------- Moves all RPCs outside of the region writeLock - the writeLock is now only used long enough to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon acquiring the lock, and thus throw NSRE. In the case that we abort the split, it will reopen the region as before. Accessors will have gotten NSRE but will just come back to the same region eventually. This addresses bug HBASE-2964 . http://issues.apache.org/jira/browse/HBASE-2964 Diffs (updated) src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java a692125 src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java a245d97 Diff: http://review.cloudera.org/r/798/diff Testing ------- YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran a 5 hour load test overnight and it worked OK. Thanks, Todd
          Hide
          HBase Review Board added a comment -

          Message from: "Todd Lipcon" <todd@cloudera.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/798/#review1122
          -----------------------------------------------------------

          Seems to make sense. Let me try it on a cluster before I +1 it

          src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
          <http://review.cloudera.org/r/798/#comment3823>

          maybe now we can do an:

          assert !this.parent.lock.writeLock().isHeldByCurrentThread() : "Unsafe to hold write lock while performing RPCs";

          • Todd
          Show
          HBase Review Board added a comment - Message from: "Todd Lipcon" <todd@cloudera.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/#review1122 ----------------------------------------------------------- Seems to make sense. Let me try it on a cluster before I +1 it src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java < http://review.cloudera.org/r/798/#comment3823 > maybe now we can do an: assert !this.parent.lock.writeLock().isHeldByCurrentThread() : "Unsafe to hold write lock while performing RPCs"; Todd
          Hide
          Todd Lipcon added a comment -

          +1 to stack's patch from reviewboard. Imported about 550G over night, worked OK.

          Show
          Todd Lipcon added a comment - +1 to stack's patch from reviewboard. Imported about 550G over night, worked OK.
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          On 2010-09-07 18:33:16, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java, line 207

          > <http://review.cloudera.org/r/798/diff/2/?file=11132#file11132line207>

          >

          > maybe now we can do an:

          >

          > assert !this.parent.lock.writeLock().isHeldByCurrentThread() : "Unsafe to hold write lock while performing RPCs";

          I'll add in this assert

          • stack

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/798/#review1122
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: stack@duboce.net On 2010-09-07 18:33:16, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java, line 207 > < http://review.cloudera.org/r/798/diff/2/?file=11132#file11132line207 > > > maybe now we can do an: > > assert !this.parent.lock.writeLock().isHeldByCurrentThread() : "Unsafe to hold write lock while performing RPCs"; I'll add in this assert stack ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/#review1122 -----------------------------------------------------------
          Hide
          stack added a comment -

          Thanks for review and for testing Todd (applied to TRUNK and to 0.89.20100830 branch.

          Show
          stack added a comment - Thanks for review and for testing Todd (applied to TRUNK and to 0.89.20100830 branch.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development