HBase
  1. HBase
  2. HBASE-10751

TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.96.2, 0.98.1, 0.99.0, 0.94.18
    • Component/s: None
    • Labels:
      None

      Description

      I saw this here https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/213/testReport/junit/org.apache.hadoop.hbase.regionserver/TestHRegion/testWritesWhileScanning/

      This patch looks to have exposed a problem in our HStore commit logic. We are supposed to crash out if we fail to write but we keep going here. I am having trouble figuring why. Let me write a little test:

      2014-03-14 01:58:48,647 DEBUG [Thread-3] regionserver.HRegionFileSystem(339): Committing store file /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/.tmp/a0e6579af25f463ebb7eebe3c043b8a0 as /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/family7/a0e6579af25f463ebb7eebe3c043b8a0
      2014-03-14 01:58:48,647 INFO  [Thread-2] regionserver.HRegion(5779): writing data to region testWritesWhileScanning,,1394762315120.306ea000673d780f06daf2469e7f9bab. with WAL disabled. Data may be lost in the event of a crash.
      2014-03-14 01:58:48,648 ERROR [Thread-3] regionserver.HStore$StoreFlusherImpl(1964): Failed to commit store file /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/.tmp/a0e6579af25f463ebb7eebe3c043b8a0
      org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file file:/home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/family7/a0e6579af25f463ebb7eebe3c043b8a0
      	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:552)
      	at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:580)
      	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1019)
      	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:211)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:350)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:445)
      	at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:551)
      	at org.apache.hadoop.hbase.regionserver.HStore.commitFile(HStore.java:842)
      	at org.apache.hadoop.hbase.regionserver.HStore.access$200(HStore.java:118)
      	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:1961)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1706)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1583)
      	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1498)
      	at org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3034)
      Caused by: java.nio.channels.ClosedByInterruptException
      	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
      	at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:282)
      	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.seek(RawLocalFileSystem.java:111)
      	at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:78)
      	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
      	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:206)
      	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
      	at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
      	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
      	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
      	at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
      	at org.apache.hadoop.fs.FSInputChecker.seek(FSInputChecker.java:365)
      	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.seek(ChecksumFileSystem.java:271)
      	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
      	at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:389)
      	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:537)
      	... 13 more
      2014-03-14 01:58:48,657 DEBUG [pool-1-thread-1] regionserver.HRegion(1037): Closing testWritesWhileScanning,,1394762315120.306ea000673d780f06daf2469e7f9bab.: disabling compactions & flushes
      2014-03-14 01:58:48,657 INFO  [pool-1-thread-1] regionserver.HRegion(1045): Running close preflush of testWritesWhileScanning,,1394762315120.306ea000673d780f06daf2469e7f9bab.
      
      1. 10751.txt
        1 kB
        stack
      2. 10751.addendum.txt
        0.7 kB
        stack

        Activity

        Hide
        stack added a comment -

        So, the stack trace above is a bit of a red herring. It is happening because we are interrupting the tests's background thread on our way out. It is causing a DroppedSnapshotException to be thrown that we are ignoring (because it is happening when we are 'done'). Because we are not 'exiting' on this DSE, the memory accounting is all off so we are in strange state – unable to successfully flush yet memory accountings says there is stuff to flush (Because we did not react to the original DSE).

        Let me apply this small patch so we just ignore the second DSE that happens on the way out (The reason this test failed).

        Show
        stack added a comment - So, the stack trace above is a bit of a red herring. It is happening because we are interrupting the tests's background thread on our way out. It is causing a DroppedSnapshotException to be thrown that we are ignoring (because it is happening when we are 'done'). Because we are not 'exiting' on this DSE, the memory accounting is all off so we are in strange state – unable to successfully flush yet memory accountings says there is stuff to flush (Because we did not react to the original DSE). Let me apply this small patch so we just ignore the second DSE that happens on the way out (The reason this test failed).
        Hide
        stack added a comment -

        Committed small test change to 0.96-0.99

        Show
        stack added a comment - Committed small test change to 0.96-0.99
        Hide
        stack added a comment -

        Committed to 0.94 too after HBASE-10514 went in.

        Show
        stack added a comment - Committed to 0.94 too after HBASE-10514 went in.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96-hadoop2 #239 (See https://builds.apache.org/job/hbase-0.96-hadoop2/239/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667)

        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #239 (See https://builds.apache.org/job/hbase-0.96-hadoop2/239/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667) /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96 #349 (See https://builds.apache.org/job/hbase-0.96/349/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667)

        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96 #349 (See https://builds.apache.org/job/hbase-0.96/349/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667) /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98 #232 (See https://builds.apache.org/job/HBase-0.98/232/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666)

        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98 #232 (See https://builds.apache.org/job/HBase-0.98/232/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666) /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5012 (See https://builds.apache.org/job/HBase-TRUNK/5012/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664)

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5012 (See https://builds.apache.org/job/HBase-TRUNK/5012/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #217 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/217/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666)

        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #217 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/217/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666) /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #118 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/118/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664)

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #118 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/118/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        stack added a comment -

        Addendum for 0.94. W/o this, 0.94 does not compile.

        Show
        stack added a comment - Addendum for 0.94. W/o this, 0.94 does not compile.
        Hide
        stack added a comment -

        I committed the addendum to 0.94.

        Show
        stack added a comment - I committed the addendum to 0.94.
        Hide
        Lars Hofhansl added a comment -

        We just crossed updates. Strange that svn didn't complain. Lemme cleanup the imports.
        No more updates to 0.94, please, trying to cut a release

        Show
        Lars Hofhansl added a comment - We just crossed updates. Strange that svn didn't complain. Lemme cleanup the imports. No more updates to 0.94, please, trying to cut a release
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94.18-security #7 (See https://builds.apache.org/job/HBase-0.94.18-security/7/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94.18-security #7 (See https://builds.apache.org/job/HBase-0.94.18-security/7/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-security #447 (See https://builds.apache.org/job/HBase-0.94-security/447/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #447 (See https://builds.apache.org/job/HBase-0.94-security/447/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94.18 #17 (See https://builds.apache.org/job/HBase-0.94.18/17/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94.18 #17 (See https://builds.apache.org/job/HBase-0.94.18/17/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94 #1328 (See https://builds.apache.org/job/HBase-0.94/1328/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1328 (See https://builds.apache.org/job/HBase-0.94/1328/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #57 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/57/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #57 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/57/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-JDK7 #91 (See https://builds.apache.org/job/HBase-0.94-JDK7/91/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #91 (See https://builds.apache.org/job/HBase-0.94-JDK7/91/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java

          People

          • Assignee:
            stack
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development