HBase
  1. HBase
  2. HBASE-10751

TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.96.2, 0.98.1, 0.99.0, 0.94.18
    • Component/s: None
    • Labels:
      None

      Description

      I saw this here https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/213/testReport/junit/org.apache.hadoop.hbase.regionserver/TestHRegion/testWritesWhileScanning/

      This patch looks to have exposed a problem in our HStore commit logic. We are supposed to crash out if we fail to write but we keep going here. I am having trouble figuring why. Let me write a little test:

      2014-03-14 01:58:48,647 DEBUG [Thread-3] regionserver.HRegionFileSystem(339): Committing store file /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/.tmp/a0e6579af25f463ebb7eebe3c043b8a0 as /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/family7/a0e6579af25f463ebb7eebe3c043b8a0
      2014-03-14 01:58:48,647 INFO  [Thread-2] regionserver.HRegion(5779): writing data to region testWritesWhileScanning,,1394762315120.306ea000673d780f06daf2469e7f9bab. with WAL disabled. Data may be lost in the event of a crash.
      2014-03-14 01:58:48,648 ERROR [Thread-3] regionserver.HStore$StoreFlusherImpl(1964): Failed to commit store file /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/.tmp/a0e6579af25f463ebb7eebe3c043b8a0
      org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file file:/home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/0.98-hadoop1.1/hbase-server/target/test-data/f7999012-e166-4619-ab3c-5014e0f65007/data/default/testWritesWhileScanning/306ea000673d780f06daf2469e7f9bab/family7/a0e6579af25f463ebb7eebe3c043b8a0
      	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:552)
      	at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:580)
      	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1019)
      	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:211)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:350)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:445)
      	at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:551)
      	at org.apache.hadoop.hbase.regionserver.HStore.commitFile(HStore.java:842)
      	at org.apache.hadoop.hbase.regionserver.HStore.access$200(HStore.java:118)
      	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:1961)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1706)
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1583)
      	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1498)
      	at org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3034)
      Caused by: java.nio.channels.ClosedByInterruptException
      	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
      	at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:282)
      	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.seek(RawLocalFileSystem.java:111)
      	at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:78)
      	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
      	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:206)
      	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
      	at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
      	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
      	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
      	at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
      	at org.apache.hadoop.fs.FSInputChecker.seek(FSInputChecker.java:365)
      	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.seek(ChecksumFileSystem.java:271)
      	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
      	at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:389)
      	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:537)
      	... 13 more
      2014-03-14 01:58:48,657 DEBUG [pool-1-thread-1] regionserver.HRegion(1037): Closing testWritesWhileScanning,,1394762315120.306ea000673d780f06daf2469e7f9bab.: disabling compactions & flushes
      2014-03-14 01:58:48,657 INFO  [pool-1-thread-1] regionserver.HRegion(1045): Running close preflush of testWritesWhileScanning,,1394762315120.306ea000673d780f06daf2469e7f9bab.
      
      1. 10751.txt
        1 kB
        stack
      2. 10751.addendum.txt
        0.7 kB
        stack

        Activity

        stack created issue -
        Hide
        stack added a comment -

        So, the stack trace above is a bit of a red herring. It is happening because we are interrupting the tests's background thread on our way out. It is causing a DroppedSnapshotException to be thrown that we are ignoring (because it is happening when we are 'done'). Because we are not 'exiting' on this DSE, the memory accounting is all off so we are in strange state – unable to successfully flush yet memory accountings says there is stuff to flush (Because we did not react to the original DSE).

        Let me apply this small patch so we just ignore the second DSE that happens on the way out (The reason this test failed).

        Show
        stack added a comment - So, the stack trace above is a bit of a red herring. It is happening because we are interrupting the tests's background thread on our way out. It is causing a DroppedSnapshotException to be thrown that we are ignoring (because it is happening when we are 'done'). Because we are not 'exiting' on this DSE, the memory accounting is all off so we are in strange state – unable to successfully flush yet memory accountings says there is stuff to flush (Because we did not react to the original DSE). Let me apply this small patch so we just ignore the second DSE that happens on the way out (The reason this test failed).
        stack made changes -
        Field Original Value New Value
        Attachment 10751.txt [ 12634798 ]
        Hide
        stack added a comment -

        Committed small test change to 0.96-0.99

        Show
        stack added a comment - Committed small test change to 0.96-0.99
        stack made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.96.2 [ 12325658 ]
        Fix Version/s 0.98.1 [ 12325664 ]
        Fix Version/s 0.99.0 [ 12325675 ]
        Resolution Fixed [ 1 ]
        Hide
        stack added a comment -

        Committed to 0.94 too after HBASE-10514 went in.

        Show
        stack added a comment - Committed to 0.94 too after HBASE-10514 went in.
        stack made changes -
        Fix Version/s 0.94.18 [ 12325952 ]
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96-hadoop2 #239 (See https://builds.apache.org/job/hbase-0.96-hadoop2/239/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667)

        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #239 (See https://builds.apache.org/job/hbase-0.96-hadoop2/239/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667) /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96 #349 (See https://builds.apache.org/job/hbase-0.96/349/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667)

        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96 #349 (See https://builds.apache.org/job/hbase-0.96/349/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577667) /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98 #232 (See https://builds.apache.org/job/HBase-0.98/232/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666)

        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98 #232 (See https://builds.apache.org/job/HBase-0.98/232/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666) /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5012 (See https://builds.apache.org/job/HBase-TRUNK/5012/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664)

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5012 (See https://builds.apache.org/job/HBase-TRUNK/5012/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #217 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/217/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666)

        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #217 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/217/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577666) /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #118 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/118/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664)

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #118 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/118/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577664) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        stack added a comment -

        Addendum for 0.94. W/o this, 0.94 does not compile.

        Show
        stack added a comment - Addendum for 0.94. W/o this, 0.94 does not compile.
        stack made changes -
        Attachment 10751.addendum.txt [ 12634892 ]
        Hide
        stack added a comment -

        I committed the addendum to 0.94.

        Show
        stack added a comment - I committed the addendum to 0.94.
        Hide
        Lars Hofhansl added a comment -

        We just crossed updates. Strange that svn didn't complain. Lemme cleanup the imports.
        No more updates to 0.94, please, trying to cut a release

        Show
        Lars Hofhansl added a comment - We just crossed updates. Strange that svn didn't complain. Lemme cleanup the imports. No more updates to 0.94, please, trying to cut a release
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94.18-security #7 (See https://builds.apache.org/job/HBase-0.94.18-security/7/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94.18-security #7 (See https://builds.apache.org/job/HBase-0.94.18-security/7/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-security #447 (See https://builds.apache.org/job/HBase-0.94-security/447/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #447 (See https://builds.apache.org/job/HBase-0.94-security/447/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94.18 #17 (See https://builds.apache.org/job/HBase-0.94.18/17/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94.18 #17 (See https://builds.apache.org/job/HBase-0.94.18/17/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94 #1328 (See https://builds.apache.org/job/HBase-0.94/1328/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1328 (See https://builds.apache.org/job/HBase-0.94/1328/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #57 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/57/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #57 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/57/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-JDK7 #91 (See https://builds.apache.org/job/HBase-0.94-JDK7/91/)
        HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784)

        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #91 (See https://builds.apache.org/job/HBase-0.94-JDK7/91/ ) HBASE-10751 TestHRegion testWritesWhileScanning occasional fail since HBASE-10514 went in (stack: rev 1577784) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        1h 12m 1 stack 14/Mar/14 19:17
        Resolved Resolved Closed Closed
        178d 2h 6m 1 Lars Hofhansl 08/Sep/14 21:23

          People

          • Assignee:
            stack
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development