HBase
  1. HBase
  2. HBASE-3666

TestScannerTimeout fails occasionally

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.1
    • Fix Version/s: 0.90.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      LeaseExceptionIf I loop TestScannerTimeout, it eventually fails with:

      org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4526340287831625207' does not exist
      at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:209)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1816)
      ...
      at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:83)
      at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:38)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1003)
      at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1103)
      at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1175)
      at org.apache.hadoop.hbase.client.TestScannerTimeout.test2772(TestScannerTimeout.java:133)

      I think the issue is a race where at the top of the function, the scanner does exist, but by the time it gets to cancelLease, it has timed out.

      1. hbase-3666.txt
        1 kB
        Todd Lipcon

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #1814 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1814/)

        Show
        Hudson added a comment - Integrated in HBase-TRUNK #1814 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1814/ )
        Hide
        stack added a comment -

        Applied branch and trunk.

        Show
        stack added a comment - Applied branch and trunk.
        Hide
        stack added a comment -

        True. I'm committing your patch as is Todd.

        Show
        stack added a comment - True. I'm committing your patch as is Todd.
        Hide
        Todd Lipcon added a comment -

        If the fs went away, it will also have aborted, in which case the RS is also shutting down

        Show
        Todd Lipcon added a comment - If the fs went away, it will also have aborted, in which case the RS is also shutting down
        Hide
        stack added a comment -

        It looks like usually its shutdown only but seems like it could be because we determined the fs went away also.

        Show
        stack added a comment - It looks like usually its shutdown only but seems like it could be because we determined the fs went away also.
        Hide
        Todd Lipcon added a comment -

        checkOpen in this case is on the HRegionServer, not the region, so I think it's only on shutdown

        Show
        Todd Lipcon added a comment - checkOpen in this case is on the HRegionServer, not the region, so I think it's only on shutdown
        Hide
        stack added a comment -

        +1 but would change the message to be more generic – can checkopen fail because region is closing or for some other reason than just shutdown?

        Show
        stack added a comment - +1 but would change the message to be more generic – can checkopen fail because region is closing or for some other reason than just shutdown?
        Hide
        Todd Lipcon added a comment -

        Proposed fix.

        Show
        Todd Lipcon added a comment - Proposed fix.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development