Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: io
    • Labels:
      None

      Description

      TestJobStatusPersistency failed and contained DataNode stacktraces similar to the following :

      2008-03-07 21:27:00,410 ERROR dfs.DataNode (DataNode.java:run(976)) - 127.0.0.1:57790:DataXceiver: java.net.SocketTimeoutException: 0 millis 
      timeout while waiting for Unknown Addr (local: /127.0.0.1:57790) to be ready for read
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:188)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:135)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:121)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
              at java.io.DataInputStream.readInt(DataInputStream.java:370)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2434)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1170)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:953)
              at java.lang.Thread.run(Thread.java:619)
      

      This is mostly related to HADOOP-2346. The error is strange. socket.getRemoteSocketAddress() returned null implying this socket is not connected yet. But we have already read a few bytes from it!.

      1. HADOOP-2971.patch
        3 kB
        Raghu Angadi
      2. HADOOP-2971.patch
        3 kB
        Raghu Angadi

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        2d 20h 23m 1 Raghu Angadi 10/Mar/08 18:06
        Patch Available Patch Available Resolved Resolved
        9h 44m 1 Raghu Angadi 11/Mar/08 03:51
        Resolved Resolved Closed Closed
        71d 16h 14m 1 Nigel Daley 21/May/08 21:05
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #426 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/426/ )
        Raghu Angadi made changes -
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hide
        Raghu Angadi added a comment -

        I just committed this.

        Show
        Raghu Angadi added a comment - I just committed this.
        Hide
        Raghu Angadi added a comment -

        The fixes some of the sporadically failing tests.

        Regd the core tests failed, I can't trace them to bug here but this might improve anyway.

        Show
        Raghu Angadi added a comment - The fixes some of the sporadically failing tests. Regd the core tests failed, I can't trace them to bug here but this might improve anyway.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.

        Show
        Raghu Angadi added a comment - Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.
        Raghu Angadi made changes -
        Fix Version/s 0.17.0 [ 12312913 ]
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Raghu Angadi added a comment -

        Thanks Nicholas.

        Show
        Raghu Angadi added a comment - Thanks Nicholas.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 codes look good

        Show
        Tsz Wo Nicholas Sze added a comment - +1 codes look good
        Raghu Angadi made changes -
        Attachment HADOOP-2971.patch [ 12377535 ]
        Hide
        Raghu Angadi added a comment -

        minor modification to patch.

        Show
        Raghu Angadi added a comment - minor modification to patch.
        Raghu Angadi made changes -
        Field Original Value New Value
        Attachment HADOOP-2971.patch [ 12377412 ]
        Hide
        Raghu Angadi added a comment - - edited

        I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck.

        The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.

        Show
        Raghu Angadi added a comment - - edited I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck. The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.
        Hide
        Raghu Angadi added a comment -

        I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.

        Show
        Raghu Angadi added a comment - I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.
        Raghu Angadi created issue -

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Raghu Angadi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development