Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.3
    • Fix Version/s: 0.18.3
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestFileCreation fails once in a while.

      1. 3883_20081007.patch
        13 kB
        Tsz Wo Nicholas Sze
      2. 3883_20081008.patch
        11 kB
        Tsz Wo Nicholas Sze
      3. 3883_20081008b.patch
        8 kB
        Tsz Wo Nicholas Sze
      4. 3883_20081008b_0.18.patch
        8 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Lohit Vijayarenu added a comment -

          See
          http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2995/testReport/org.apache.hadoop.hdfs/TestFileCreation/testFileCreationSimulated/

          I see this null pointer exception in the log

          2008-07-31 12:56:44,164 INFO  datanode.DataNode (DataNode.java:receiveBlock(2796)) - Exception in receiveBlock for block blk_1881319291974486876_1010 java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[closed]. 0 millis timeout left.
          2008-07-31 12:56:44,165 ERROR datanode.DataNode (DataNode.java:run(1094)) - DatanodeRegistration(127.0.0.1:39613, storageID=DS-1070412314-140.211.11.106-39613-1217508977073, infoPort=39614, ipcPort=39615):DataXceiver: java.lang.NullPointerException
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:608)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.access$1400(DataNode.java:106)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode$BlockReceiver.close(DataNode.java:2397)
          	at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:135)
          	at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:151)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode$BlockReceiver.receiveBlock(DataNode.java:2798)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode$DataXceiver.writeBlock(DataNode.java:1309)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode$DataXceiver.run(DataNode.java:1071)
          	at java.lang.Thread.run(Thread.java:619)
          
          Show
          Lohit Vijayarenu added a comment - See http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2995/testReport/org.apache.hadoop.hdfs/TestFileCreation/testFileCreationSimulated/ I see this null pointer exception in the log 2008-07-31 12:56:44,164 INFO datanode.DataNode (DataNode.java:receiveBlock(2796)) - Exception in receiveBlock for block blk_1881319291974486876_1010 java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[closed]. 0 millis timeout left. 2008-07-31 12:56:44,165 ERROR datanode.DataNode (DataNode.java:run(1094)) - DatanodeRegistration(127.0.0.1:39613, storageID=DS-1070412314-140.211.11.106-39613-1217508977073, infoPort=39614, ipcPort=39615):DataXceiver: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:608) at org.apache.hadoop.hdfs.server.datanode.DataNode.access$1400(DataNode.java:106) at org.apache.hadoop.hdfs.server.datanode.DataNode$BlockReceiver.close(DataNode.java:2397) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:135) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:151) at org.apache.hadoop.hdfs.server.datanode.DataNode$BlockReceiver.receiveBlock(DataNode.java:2798) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataXceiver.writeBlock(DataNode.java:1309) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataXceiver.run(DataNode.java:1071) at java.lang.Thread.run(Thread.java:619)
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Another relevant failure: TestFileCreation.testClientTriggeredLeaseRecovery. Saw this while running HADOOP-4173 through Hudson. Please see http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3362/testReport/org.apache.hadoop.hdfs/TestFileCreation/testClientTriggeredLeaseRecovery/

          Error Message
          
          Could not obtain block: blk_4808292499378686614_1012 file=/wrwelkj/file0
          
          Stacktrace
          
          java.io.IOException: Could not obtain block: blk_4808292499378686614_1012 file=/wrwelkj/file0
          	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1693)
          	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1521)
          	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1648)
          	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1578)
          	at java.io.FilterInputStream.read(FilterInputStream.java:66)
          	at org.apache.hadoop.hdfs.TestFileCreation.testClientTriggeredLeaseRecovery(TestFileCreation.java:777)
          

          The test ran to successful completion on my machine though.

          Show
          Vinod Kumar Vavilapalli added a comment - Another relevant failure: TestFileCreation.testClientTriggeredLeaseRecovery. Saw this while running HADOOP-4173 through Hudson. Please see http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3362/testReport/org.apache.hadoop.hdfs/TestFileCreation/testClientTriggeredLeaseRecovery/ Error Message Could not obtain block: blk_4808292499378686614_1012 file=/wrwelkj/file0 Stacktrace java.io.IOException: Could not obtain block: blk_4808292499378686614_1012 file=/wrwelkj/file0 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1693) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1521) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1648) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1578) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.hdfs.TestFileCreation.testClientTriggeredLeaseRecovery(TestFileCreation.java:777) The test ran to successful completion on my machine though.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          TestFileCreation keeps failing occasionally. For example, see

          HADOOP-3786: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3369/testReport/
          HADOOP-4259: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3364/testReport/

          There are no code changes in the issues above.

          Show
          Tsz Wo Nicholas Sze added a comment - TestFileCreation keeps failing occasionally. For example, see HADOOP-3786 : http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3369/testReport/ HADOOP-4259 : http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3364/testReport/ There are no code changes in the issues above.
          Hide
          Robert Chansler added a comment -

          Gotta be in 0.19!

          Show
          Robert Chansler added a comment - Gotta be in 0.19!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          One problem here is that namenode assigns two or more generation stamps to the same block within a short period.

          Namenode should remember the assign time of the previous generation stamp. If the current generation stamp request is within a short period of the previous one, Namenode should reject the request.

          Show
          Tsz Wo Nicholas Sze added a comment - One problem here is that namenode assigns two or more generation stamps to the same block within a short period. Namenode should remember the assign time of the previous generation stamp. If the current generation stamp request is within a short period of the previous one, Namenode should reject the request.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3883_20081007.patch:

          • Limit namenode to assign at most one generation stamp for a particular block within a short period, which is 10 seconds defined in FSConstants.LEASE_RECOVER_PERIOD.
          • Introduce GenerationStamp.UpdateException so that update from a higher gs to a lower gs will be stopped.
          • Introduce BlockRecoveryException. This is another bug: Previously, null is returned in DataNode.syncBlock(...) for both normal case and error case.
          Show
          Tsz Wo Nicholas Sze added a comment - 3883_20081007.patch: Limit namenode to assign at most one generation stamp for a particular block within a short period, which is 10 seconds defined in FSConstants.LEASE_RECOVER_PERIOD. Introduce GenerationStamp.UpdateException so that update from a higher gs to a lower gs will be stopped. Introduce BlockRecoveryException. This is another bug: Previously, null is returned in DataNode.syncBlock(...) for both normal case and error case.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3883_20081008.patch: use IOException instead of defining a class BlockRecoveryException. Updated javadoc.

          Show
          Tsz Wo Nicholas Sze added a comment - 3883_20081008.patch: use IOException instead of defining a class BlockRecoveryException. Updated javadoc.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3883_20081008b.patch: removed GenerationStamp.UpdateException since it might cause other problems. Thanks Dhruba for pointing this out.

          Show
          Tsz Wo Nicholas Sze added a comment - 3883_20081008b.patch: removed GenerationStamp.UpdateException since it might cause other problems. Thanks Dhruba for pointing this out.
          Hide
          Tsz Wo Nicholas Sze added a comment -
               [exec] +1 overall.  
          
               [exec]     +1 @author.  The patch does not contain any @author tags.
          
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
          
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
          
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
          
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          

          Passed all tests except TestJobQueueInformation, which is taken care by HADOOP-4378.

          Show
          Tsz Wo Nicholas Sze added a comment - [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. Passed all tests except TestJobQueueInformation, which is taken care by HADOOP-4378 .
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12391794/3883_20081008b.patch
          against trunk revision 703508.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12391794/3883_20081008b.patch against trunk revision 703508. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3440/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          +1. Code looks good.

          Show
          dhruba borthakur added a comment - +1. Code looks good.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #630 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/630/)
          . Limit namenode to assign at most one generation stamp for a particular block within a short period. (szetszwo)

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #630 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/630/ ) . Limit namenode to assign at most one generation stamp for a particular block within a short period. (szetszwo)
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch should also go to 0.18.

          Show
          Tsz Wo Nicholas Sze added a comment - The patch should also go to 0.18.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3883_20081008b_0.18.patch: for 0.18

          Show
          Tsz Wo Nicholas Sze added a comment - 3883_20081008b_0.18.patch: for 0.18
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Passed all tests for the 0.18 patch locally.

          I just committed this to 0.18.

          Show
          Tsz Wo Nicholas Sze added a comment - Passed all tests for the 0.18 patch locally. I just committed this to 0.18.

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Lohit Vijayarenu
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development