HBase
  1. HBase
  2. HBASE-2077

NullPointerException with an open scanner that expired causing an immediate region server shutdown

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.2, 0.20.3
    • Fix Version/s: 0.90.4
    • Component/s: regionserver
    • Labels:
      None
    • Environment:

      Hadoop 0.20.0, Mac OS X, Java 6

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Removes lease from lease monitor while operation is running inside the server.

      Description

      2009-12-29 18:05:55,432 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4250070597157694417 lease expired
      2009-12-29 18:05:55,443 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
      java.lang.NullPointerException
      at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
      at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
      at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
      at java.util.PriorityQueue.poll(PriorityQueue.java:523)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
      at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
      2009-12-29 18:05:55,446 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 55260, call next(-4250070597157694417, 10000) from 192.168.1.90:54011: error: java.io.IOException: java.lang.NullPointerException
      java.io.IOException: java.lang.NullPointerException
      at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965)
      at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
      at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
      at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
      at java.util.PriorityQueue.poll(PriorityQueue.java:523)
      at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
      ... 5 more
      2009-12-29 18:05:55,447 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-4250070597157694417, 10000) from 192.168.1.90:54011: output error
      2009-12-29 18:05:55,448 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 55260 caught: java.nio.channels.ClosedChannelException
      at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
      at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
      at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943)

      2009-12-29 18:05:56,322 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 55260
      2009-12-29 18:05:56,322 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 55260

      1. [Bug_HBASE-2077]_Fixes_a_very_rare_race_condition_between_lease_expiration_and_renewal.patch
        1 kB
        Sam Pullara
      2. HBASE-2077-redux.patch
        2 kB
        Jean-Daniel Cryans
      3. HBASE-2077-3.patch
        5 kB
        Jean-Daniel Cryans
      4. 2077-suggestion.txt
        4 kB
        stack
      5. 2077-v4.txt
        6 kB
        stack

        Issue Links

          Activity

          Hide
          Sam Pullara added a comment -

          jdcryans helped narrow it down to an issue in Leases where the lease would polled from the queue at the same time it was being renewed. since there is no sychronization protection there is a race condition. I am attaching a patch that should fix the problem though it is very difficult to reproduce.

          Show
          Sam Pullara added a comment - jdcryans helped narrow it down to an issue in Leases where the lease would polled from the queue at the same time it was being renewed. since there is no sychronization protection there is a race condition. I am attaching a patch that should fix the problem though it is very difficult to reproduce.
          Hide
          Sam Pullara added a comment -

          Patch to fix race condition

          Show
          Sam Pullara added a comment - Patch to fix race condition
          Hide
          Jean-Daniel Cryans added a comment -

          I ran the client test and it passes. Committed to branch and trunk. Made Sam a new contributor, thanks for the patch!

          Show
          Jean-Daniel Cryans added a comment - I ran the client test and it passes. Committed to branch and trunk. Made Sam a new contributor, thanks for the patch!
          Hide
          stack added a comment -

          Patch looks good to me.

          Show
          stack added a comment - Patch looks good to me.
          Hide
          Jean-Daniel Cryans added a comment -

          This is not fixed in 0.20.3, even tho the patch got in we still get the error:

          2010-02-22 19:18:06,638 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4405675591371793964 lease expired
          2010-02-22 19:18:06,639 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: 
          java.lang.NullPointerException
          	at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
          	at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
          	at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
          	at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
          	at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
          	at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
          	at java.util.PriorityQueue.poll(PriorityQueue.java:523)
          	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
          	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1807)
          	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1771)
          	at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1894)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
          	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
          
          Show
          Jean-Daniel Cryans added a comment - This is not fixed in 0.20.3, even tho the patch got in we still get the error: 2010-02-22 19:18:06,638 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4405675591371793964 lease expired 2010-02-22 19:18:06,639 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1807) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1771) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1894) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
          Hide
          Jean-Daniel Cryans added a comment -

          The way we are using the DelayQueue looks broken. We wait for leaseCheckFrequency (or the delay time, whichever happens first) in Leases and then we just expire the scanner. I think there's times when we really expire scanners that aren't done yet. This patch adds a check in run() to verify the element is really expired.

          Show
          Jean-Daniel Cryans added a comment - The way we are using the DelayQueue looks broken. We wait for leaseCheckFrequency (or the delay time, whichever happens first) in Leases and then we just expire the scanner. I think there's times when we really expire scanners that aren't done yet. This patch adds a check in run() to verify the element is really expired.
          Hide
          Jean-Daniel Cryans added a comment -

          Moving in 0.20.4 and marking critical.

          Show
          Jean-Daniel Cryans added a comment - Moving in 0.20.4 and marking critical.
          Hide
          Yoram Kulbak added a comment -

          Not sure whether this is related, but we've seen similar behaviour:

          We've seen several of these exact stack traces (on 0.20.4-dev) which were accompanied by an immediate region server shutdown. At start it seemed that this error causes the shutdown but in every case, after a thorough examination of the logs, we also found a zookeeper session timeout. Eventually we discovered it was a faulty disk causing large delays at unpredictable times.

          In our cases we also had messages like:
          2010-02-05 16:44:20,821 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 63152ms, ten times longer than scheduled: 1000
          and
          2010-02-05 16:44:26,448 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired

          I can provide more details if this is relevant

          Show
          Yoram Kulbak added a comment - Not sure whether this is related, but we've seen similar behaviour: We've seen several of these exact stack traces (on 0.20.4-dev) which were accompanied by an immediate region server shutdown. At start it seemed that this error causes the shutdown but in every case, after a thorough examination of the logs, we also found a zookeeper session timeout. Eventually we discovered it was a faulty disk causing large delays at unpredictable times. In our cases we also had messages like: 2010-02-05 16:44:20,821 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 63152ms, ten times longer than scheduled: 1000 and 2010-02-05 16:44:26,448 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired I can provide more details if this is relevant
          Hide
          Jean-Daniel Cryans added a comment -

          @Yoram

          The NPEs are enough to kill a region servers, what you pasted is a GC pause that took more than 1 minute. But, I see how that could happen... If for some reason the user scans more than 1 row (scanner caching), and that during the scan the thread is paused for some reason, there's a good possibility that on the next iteration in HRS.next() the lease has already been expired.

          I would love to see a log of such an event.

          Show
          Jean-Daniel Cryans added a comment - @Yoram The NPEs are enough to kill a region servers, what you pasted is a GC pause that took more than 1 minute. But, I see how that could happen... If for some reason the user scans more than 1 row (scanner caching), and that during the scan the thread is paused for some reason, there's a good possibility that on the next iteration in HRS.next() the lease has already been expired. I would love to see a log of such an event.
          Hide
          Ian Pye added a comment -

          I've applied the patch to 0.20.3, and things look like they are working fine for me now. The same map/reduce job which crashed with the above stack trace now completes without error.
          In my case, I have an expensive reduce phase, which is running on 1701037 rows input. It looks like this was taking so long to complete that the scanner timeout problem was happening.

          Show
          Ian Pye added a comment - I've applied the patch to 0.20.3, and things look like they are working fine for me now. The same map/reduce job which crashed with the above stack trace now completes without error. In my case, I have an expensive reduce phase, which is running on 1701037 rows input. It looks like this was taking so long to complete that the scanner timeout problem was happening.
          Hide
          Jean-Daniel Cryans added a comment -

          @Ian, that's good news.

          Also I got Yoram's logs off-list, I'll take a look tomorrow and see if there's anything that's in the scope of this jira.

          Show
          Jean-Daniel Cryans added a comment - @Ian, that's good news. Also I got Yoram's logs off-list, I'll take a look tomorrow and see if there's anything that's in the scope of this jira.
          Hide
          Jean-Daniel Cryans added a comment -

          WRT Yoram's logs, it went like I thought it was eg a scan begins before the GC pause and when it ends the lease is expired and around the same time a scanner tries to finish. Does it make sense to handle such a case? From a client perspective, a pause of 1 minute probably already timed it out.

          I will go forward and commit this patch as it fixed Ian's problem.

          Show
          Jean-Daniel Cryans added a comment - WRT Yoram's logs, it went like I thought it was eg a scan begins before the GC pause and when it ends the lease is expired and around the same time a scanner tries to finish. Does it make sense to handle such a case? From a client perspective, a pause of 1 minute probably already timed it out. I will go forward and commit this patch as it fixed Ian's problem.
          Hide
          Jean-Daniel Cryans added a comment -

          Second patch committed to branch and trunk.

          Show
          Jean-Daniel Cryans added a comment - Second patch committed to branch and trunk.
          Hide
          Jean-Daniel Cryans added a comment -

          Further testing shows that my patch in fact introduced a bug that was keeping the leases opened for a long time. Reopening to see if we can dig deeper in Ian's issue.

          Show
          Jean-Daniel Cryans added a comment - Further testing shows that my patch in fact introduced a bug that was keeping the leases opened for a long time. Reopening to see if we can dig deeper in Ian's issue.
          Hide
          Jean-Daniel Cryans added a comment -

          So I'm thinking about taking a new approach to this bug. Since the major problem here is that we need knowledge of any client using a scanner (or anything else lease-related), I think we should add a new AtomicInteger inside Leases.Lease and increment it every time a user renews that lease. When you are down with the lease-related action, you decrease the AtomicInteger. This protects us from any GC pause happening while, for example, a scanner is next'ing 100 rows and gets a 60 secs pause right in the middle.

          Show
          Jean-Daniel Cryans added a comment - So I'm thinking about taking a new approach to this bug. Since the major problem here is that we need knowledge of any client using a scanner (or anything else lease-related), I think we should add a new AtomicInteger inside Leases.Lease and increment it every time a user renews that lease. When you are down with the lease-related action, you decrease the AtomicInteger. This protects us from any GC pause happening while, for example, a scanner is next'ing 100 rows and gets a 60 secs pause right in the middle.
          Hide
          Jean-Daniel Cryans added a comment -

          Patch that implements my latest idea. The user has the choice of keeping track of the lease usage or not. Passes the tests that currently pass.

          Show
          Jean-Daniel Cryans added a comment - Patch that implements my latest idea. The user has the choice of keeping track of the lease usage or not. Passes the tests that currently pass.
          Hide
          Hbase Build Acct added a comment -

          What if you renewed the lease on entry and on the way out as insurance
          against long-running 'next' invocation? Would that be 'cleaner'?

          On Wed, Mar 31, 2010 at 1:45 PM, Jean-Daniel Cryans (JIRA)

          Show
          Hbase Build Acct added a comment - What if you renewed the lease on entry and on the way out as insurance against long-running 'next' invocation? Would that be 'cleaner'? On Wed, Mar 31, 2010 at 1:45 PM, Jean-Daniel Cryans (JIRA)
          Hide
          Jean-Daniel Cryans added a comment -

          How would it be cleaner?

          Show
          Jean-Daniel Cryans added a comment - How would it be cleaner?
          Hide
          stack added a comment -

          No new long that you increment/decrement and keep an account of.

          The method name stays as renewLease rather than talk about increment/decrement ("Why do I have to do increment/decrement on a lease when all I'm interested in is lease renewal").

          Show
          stack added a comment - No new long that you increment/decrement and keep an account of. The method name stays as renewLease rather than talk about increment/decrement ("Why do I have to do increment/decrement on a lease when all I'm interested in is lease renewal").
          Hide
          Jean-Daniel Cryans added a comment -

          The method name stays as renewLease rather than talk about increment/decrement ("Why do I have to do increment/decrement on a lease when all I'm interested in is lease renewal").

          The bulk of the issue is about not timing out a lease that someone currently uses, whether there's a GC or not. To be certain we have acquired the lease during the whole operation, unless we renew the lease after each line, I don't see how we can insure the same level of safety that my patch offers.

          This patch also allows multiple users to share the lease if it's needed (hence incrementing/decrementing).

          Show
          Jean-Daniel Cryans added a comment - The method name stays as renewLease rather than talk about increment/decrement ("Why do I have to do increment/decrement on a lease when all I'm interested in is lease renewal"). The bulk of the issue is about not timing out a lease that someone currently uses, whether there's a GC or not. To be certain we have acquired the lease during the whole operation, unless we renew the lease after each line, I don't see how we can insure the same level of safety that my patch offers. This patch also allows multiple users to share the lease if it's needed (hence incrementing/decrementing).
          Hide
          stack added a comment -

          .bq This patch also allows multiple users to share the lease if it's needed (hence incrementing/decrementing).

          This seems perverse to me. When would such a usecase make sense?

          .bq To be certain we have acquired the lease during the whole operation, unless we renew the lease after each line, I don't see how we can insure the same level of safety that my patch offers.

          Ok. Makes sense that while the scanner is inside the server, then the lease moves to a different 'state'. My suggestion doesn't cover case of our timing out because of GC while scanner is a server-side resident.

          Why not remove the lease on entry and then renew it on the way out? (IMO, the increment/decrement semantic is confusing).

          Show
          stack added a comment - .bq This patch also allows multiple users to share the lease if it's needed (hence incrementing/decrementing). This seems perverse to me. When would such a usecase make sense? .bq To be certain we have acquired the lease during the whole operation, unless we renew the lease after each line, I don't see how we can insure the same level of safety that my patch offers. Ok. Makes sense that while the scanner is inside the server, then the lease moves to a different 'state'. My suggestion doesn't cover case of our timing out because of GC while scanner is a server-side resident. Why not remove the lease on entry and then renew it on the way out? (IMO, the increment/decrement semantic is confusing).
          Hide
          Jean-Daniel Cryans added a comment -

          Let's punt this to 0.20.5 then

          Show
          Jean-Daniel Cryans added a comment - Let's punt this to 0.20.5 then
          Hide
          stack added a comment -

          Marking these as fixed against 0.21.0 rather than against 0.20.5.

          Show
          stack added a comment - Marking these as fixed against 0.21.0 rather than against 0.20.5.
          Hide
          Todd Lipcon added a comment -

          What's the status of this patch in 0.20 branch? It seems it was committed then reverted, but the revert didn't actually do a full revert (left a new getExpirationTime() method in there). We're seeing the issue on 0.20.4. Does anyone have thoughts on a good fix?

          Show
          Todd Lipcon added a comment - What's the status of this patch in 0.20 branch? It seems it was committed then reverted, but the revert didn't actually do a full revert (left a new getExpirationTime() method in there). We're seeing the issue on 0.20.4. Does anyone have thoughts on a good fix?
          Hide
          Jean-Daniel Cryans added a comment -

          The first patch was committed, the second reverted (maybe I forgot something in there tho). HBASE-2503 plays around the same part of the code, and I'm pretty sure it fixes the NPE but I can't tell for sure since to trip on it you need a heavily GCing region server.

          So it should be fixed in 0.20.5, it would be awesome if you can confirm.

          Show
          Jean-Daniel Cryans added a comment - The first patch was committed, the second reverted (maybe I forgot something in there tho). HBASE-2503 plays around the same part of the code, and I'm pretty sure it fixes the NPE but I can't tell for sure since to trip on it you need a heavily GCing region server. So it should be fixed in 0.20.5, it would be awesome if you can confirm.
          Hide
          ryan rawson added a comment -

          core issue: no concurrency control between next() calls and lease timeouts. While nice to have, this is a corner case and can't hold up 0.90

          Show
          ryan rawson added a comment - core issue: no concurrency control between next() calls and lease timeouts. While nice to have, this is a corner case and can't hold up 0.90
          Hide
          stack added a comment -

          Here is a suggestion where we remove lease from leases while we are processing a request then on the way out in a finally we renew lease.

          Show
          stack added a comment - Here is a suggestion where we remove lease from leases while we are processing a request then on the way out in a finally we renew lease.
          Hide
          stack added a comment -

          Ahemm.. this is a version that actually works (TestFromClientSide is a good test for this change).

          Show
          stack added a comment - Ahemm.. this is a version that actually works (TestFromClientSide is a good test for this change).
          Hide
          Jean-Daniel Cryans added a comment -

          +1 on latest patch, I like it.

          Show
          Jean-Daniel Cryans added a comment - +1 on latest patch, I like it.
          Hide
          stack added a comment -

          Applied 2077-v4.txt to branch and trunk as a 'part2' on this issue. I'm now closing this since its gone all over the place. We've not seen Sam's original issue in a while and it'll look different in current codebase; lets open new issue then.

          Show
          stack added a comment - Applied 2077-v4.txt to branch and trunk as a 'part2' on this issue. I'm now closing this since its gone all over the place. We've not seen Sam's original issue in a while and it'll look different in current codebase; lets open new issue then.
          Hide
          Jean-Daniel Cryans added a comment -

          Yeah the NPEs are gone for me but I was still able to easily get myself into a situation that triggered expired leases by just scanning a table. This patch will solve that nicely.

          Show
          Jean-Daniel Cryans added a comment - Yeah the NPEs are gone for me but I was still able to easily get myself into a situation that triggered expired leases by just scanning a table. This patch will solve that nicely.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1995 (See https://builds.apache.org/job/HBase-TRUNK/1995/)

          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1995 (See https://builds.apache.org/job/HBase-TRUNK/1995/ )
          Hide
          Todd Lipcon added a comment -

          This is long since committed, but just a request:

          In the future could we open separate JIRAs rather than doing a "part 2" when the commits are more than a day apart? It's very difficult to figure out what went on in the history of this JIRA, since it was committed for 0.20 in Dec '09, briefly amended in Feb '10, amendation partially reverted the next day, and then another change in Jun '11 for 0.90.4 to solve an entirely different bug than the description indicates. This makes it very difficult to support past branches or maintain distributions, since it appears this was fixed long ago but in fact 0.90.3 lacks a major part of the JIRA.

          Show
          Todd Lipcon added a comment - This is long since committed, but just a request: In the future could we open separate JIRAs rather than doing a "part 2" when the commits are more than a day apart? It's very difficult to figure out what went on in the history of this JIRA, since it was committed for 0.20 in Dec '09, briefly amended in Feb '10, amendation partially reverted the next day, and then another change in Jun '11 for 0.90.4 to solve an entirely different bug than the description indicates. This makes it very difficult to support past branches or maintain distributions, since it appears this was fixed long ago but in fact 0.90.3 lacks a major part of the JIRA.
          Hide
          stack added a comment -

          Sorry Todd. Will be better going forward.

          Show
          stack added a comment - Sorry Todd. Will be better going forward.

            People

            • Assignee:
              Sam Pullara
              Reporter:
              Sam Pullara
            • Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development