Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-706

Intermittent failures in TestFiHFlush

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Running tests on a Linux box I've started seeing intermittent failures among TestFiHFlush test cases.
      It turns out that occasional failures are observed on my laptop running BSD

      1. HDFS-706.patch
        9 kB
        Konstantin Boudnik
      2. HDFS-706.patch
        9 kB
        Konstantin Boudnik
      3. TEST-org.apache.hadoop.hdfs.TestHFlush.txt
        106 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Konstantin Boudnik added a comment -

          The failing tests are one of hFlushFi0[1-3]_a. Test cases failing with the following diagnostics:

          FI: hFlushFi02_a, index=1, datanode=127.0.0.1:53701
          org.apache.hadoop.util.DiskChecker$DiskErrorException: FI: hFlushFi02_a, index=1, datanode=127.0.0.1:53701
                  at org.apache.hadoop.fi.FiHFlushTestUtil$DerrAction.run(FiHFlushTestUtil.java:55)
                  at org.apache.hadoop.fi.FiHFlushTestUtil$DerrAction.run(FiHFlushTestUtil.java:1)
                  at org.apache.hadoop.fi.FiTestUtil$ActionContainer.run(FiTestUtil.java:66)
                  at org.apache.hadoop.hdfs.HFlushAspects.ajc$after$org_apache_hadoop_hdfs_HFlushAspects$1$5141da0e(HFlushAspects.aj:54)
                  at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.hflush(DFSClient.java:3515)
                  at org.apache.hadoop.hdfs.TestHFlush.doTheJob(TestHFlush.java:112)
                  at org.apache.hadoop.hdfs.TestFiHFlush.runDiskErrorTest(TestFiHFlush.java:53)
                  at org.apache.hadoop.hdfs.TestFiHFlush.hFlushFi02_a(TestFiHFlush.java:109)
          

          Index of an exception might differ. _a tests are writing a file where write() operation stays within a block boundary. write() is followed by hflush() call.
          hflush() has an injected fault, which throws DiskErrorException on one of the pipeline's nodes. If pipeline doesn't exist then the exception won't be thrown. Which was the case for these tests since the implementation of hflush().

          Something has changed apparently.

          Show
          Konstantin Boudnik added a comment - The failing tests are one of hFlushFi0 [1-3] _a. Test cases failing with the following diagnostics: FI: hFlushFi02_a, index=1, datanode=127.0.0.1:53701 org.apache.hadoop.util.DiskChecker$DiskErrorException: FI: hFlushFi02_a, index=1, datanode=127.0.0.1:53701 at org.apache.hadoop.fi.FiHFlushTestUtil$DerrAction.run(FiHFlushTestUtil.java:55) at org.apache.hadoop.fi.FiHFlushTestUtil$DerrAction.run(FiHFlushTestUtil.java:1) at org.apache.hadoop.fi.FiTestUtil$ActionContainer.run(FiTestUtil.java:66) at org.apache.hadoop.hdfs.HFlushAspects.ajc$after$org_apache_hadoop_hdfs_HFlushAspects$1$5141da0e(HFlushAspects.aj:54) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.hflush(DFSClient.java:3515) at org.apache.hadoop.hdfs.TestHFlush.doTheJob(TestHFlush.java:112) at org.apache.hadoop.hdfs.TestFiHFlush.runDiskErrorTest(TestFiHFlush.java:53) at org.apache.hadoop.hdfs.TestFiHFlush.hFlushFi02_a(TestFiHFlush.java:109) Index of an exception might differ. _a tests are writing a file where write() operation stays within a block boundary. write() is followed by hflush() call. hflush() has an injected fault, which throws DiskErrorException on one of the pipeline's nodes. If pipeline doesn't exist then the exception won't be thrown. Which was the case for these tests since the implementation of hflush() . Something has changed apparently.
          Hide
          Eli Collins added a comment -

          I saw TestFiHFlush fail (reproducibly) with the following. TestHFlush passes. After doing a clean build and re-running the test (which regenerates the fault injection version) the test passes. Uploaded the full test log.

          ------------- Standard Error -----------------
          java.lang.IndexOutOfBoundsException
          at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:151)
          at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1411)
          at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:2081)
          at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2131)
          at java.io.DataInputStream.read(DataInputStream.java:132)
          at org.apache.hadoop.hdfs.TestHFlush.doTheJob(TestHFlush.java:118)
          at org.apache.hadoop.hdfs.TestHFlush.hFlush_01(TestHFlush.java:41)

          Show
          Eli Collins added a comment - I saw TestFiHFlush fail (reproducibly) with the following. TestHFlush passes. After doing a clean build and re-running the test (which regenerates the fault injection version) the test passes. Uploaded the full test log. ------------- Standard Error ----------------- java.lang.IndexOutOfBoundsException at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:151) at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1411) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:2081) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2131) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.TestHFlush.doTheJob(TestHFlush.java:118) at org.apache.hadoop.hdfs.TestHFlush.hFlush_01(TestHFlush.java:41)
          Hide
          Konstantin Boudnik added a comment -

          I assume you see this one as a result of

          org.apache.hadoop.util.DiskChecker$DiskErrorException: FI: DerrAction:hFlushFi02_c, index=1, datanode=
          

          If so - yes it is there and it is reproducible. It seems to be a result of DiskErrorException on a second DN when the write happens across block and checksum boundaries. I've no idea why it happens only in this case, but it seems to be Ok for the test is getting the expecting behavior, e.g. the exception goes through.

          Show
          Konstantin Boudnik added a comment - I assume you see this one as a result of org.apache.hadoop.util.DiskChecker$DiskErrorException: FI: DerrAction:hFlushFi02_c, index=1, datanode= If so - yes it is there and it is reproducible. It seems to be a result of DiskErrorException on a second DN when the write happens across block and checksum boundaries. I've no idea why it happens only in this case, but it seems to be Ok for the test is getting the expecting behavior, e.g. the exception goes through.
          Hide
          Konstantin Boudnik added a comment -

          This fixes the issue on top of what has been caused by HDFS-741.

          Show
          Konstantin Boudnik added a comment - This fixes the issue on top of what has been caused by HDFS-741 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Konstantin Boudnik added a comment -

          This is the same patch as before with a couple of commented lines removed.

          Show
          Konstantin Boudnik added a comment - This is the same patch as before with a couple of commented lines removed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424811/HDFS-706.patch
          against trunk revision 835958.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424811/HDFS-706.patch against trunk revision 835958. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/111/console This message is automatically generated.
          Hide
          Konstantin Boudnik added a comment -

          The test failure is irrelevant

          Show
          Konstantin Boudnik added a comment - The test failure is irrelevant
          Hide
          Konstantin Boudnik added a comment -

          I've ran this patch with committed HDFS-741 and everything looks Ok. Ready to commit.

          Show
          Konstantin Boudnik added a comment - I've ran this patch with committed HDFS-741 and everything looks Ok. Ready to commit.
          Hide
          Konstantin Boudnik added a comment -

          I've just committed this to the trunk and branch 0.21

          Show
          Konstantin Boudnik added a comment - I've just committed this to the trunk and branch 0.21
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #114 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/114/)
          . Intermittent failures in TestFiHFlush. Contributed by Konstantin Boudnik

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #114 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/114/ ) . Intermittent failures in TestFiHFlush. Contributed by Konstantin Boudnik
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #145 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/145/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #145 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/145/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #118 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/118/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #118 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/118/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #81 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/81/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #81 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/81/ )

            People

            • Assignee:
              Konstantin Boudnik
              Reporter:
              Konstantin Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development