Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14420 Zombie Stomping Session
  3. HBASE-14495

TestHRegion#testFlushCacheWhileScanning goes zombie

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • test
    • None
    • Hide
      The WAL append was changed by HBASE-12751. Every append now sets a latch on an edit. The latch needs to be cleared or else the WAL will hang. The original failures in TestHRegion turned up 'holes' where we were failing to throw the latch if we skipped out early because we were interrupted. Other 'holes' were found where we had mocked up a WAL so the latch would just stay in place. Futher holes were found appending WAL markers... here we were skipping the mvcc completely for a few edits. A clean up of WALUtils made all markers take the same code paths.
      Show
      The WAL append was changed by HBASE-12751 . Every append now sets a latch on an edit. The latch needs to be cleared or else the WAL will hang. The original failures in TestHRegion turned up 'holes' where we were failing to throw the latch if we skipped out early because we were interrupted. Other 'holes' were found where we had mocked up a WAL so the latch would just stay in place. Futher holes were found appending WAL markers... here we were skipping the mvcc completely for a few edits. A clean up of WALUtils made all markers take the same code paths.

    Description

      This test goes zombie on us, most recently, here: https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console

      It does not fail on my internal rig runs nor locally on laptop when run in a loop.

      Its hung up in close of the region:

      "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() [0x00007fc4a02c9000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	- waiting on <0x00000007d07c3478> (a java.lang.Object)
      	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
      	- locked <0x00000007d07c3478> (a java.lang.Object)
      	at org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
      	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
      	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
      	- locked <0x00000007d07c34a8> (a java.lang.Object)
      	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
      	at org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
      	at org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
      

      It is waiting on mvcc to catch up.

      There is this comment at the point where we are hung:

      // TODO: Lets see if we hang here, if there is a scenario where an outstanding reader
      // with a read point is in advance of this write point.
      mvcc.completeAndWait(writeEntry);

      The above came in with HBASE-12751. The comment was added at v29:

      https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt

      Looks like I added it so must have had predilection that this might be dodgy... Let me take a look...

      Attachments

        1. 14495.txt
          6 kB
          Michael Stack
        2. 14495.txt
          3 kB
          Michael Stack
        3. 14495v3.txt
          17 kB
          Michael Stack
        4. 14495v6.txt
          24 kB
          Michael Stack
        5. 14495v7.txt
          30 kB
          Michael Stack
        6. 14495v9.txt
          34 kB
          Michael Stack

        Activity

          People

            stack Michael Stack
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: