Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: io
    • Labels:
      None

      Description

      TestJobStatusPersistency failed and contained DataNode stacktraces similar to the following :

      2008-03-07 21:27:00,410 ERROR dfs.DataNode (DataNode.java:run(976)) - 127.0.0.1:57790:DataXceiver: java.net.SocketTimeoutException: 0 millis 
      timeout while waiting for Unknown Addr (local: /127.0.0.1:57790) to be ready for read
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:188)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:135)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:121)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
              at java.io.DataInputStream.readInt(DataInputStream.java:370)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2434)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1170)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:953)
              at java.lang.Thread.run(Thread.java:619)
      

      This is mostly related to HADOOP-2346. The error is strange. socket.getRemoteSocketAddress() returned null implying this socket is not connected yet. But we have already read a few bytes from it!.

      1. HADOOP-2971.patch
        3 kB
        Raghu Angadi
      2. HADOOP-2971.patch
        3 kB
        Raghu Angadi

        Activity

        Hide
        Raghu Angadi added a comment -

        I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.

        Show
        Raghu Angadi added a comment - I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.
        Hide
        Raghu Angadi added a comment - - edited

        I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck.

        The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.

        Show
        Raghu Angadi added a comment - - edited I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck. The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.
        Hide
        Raghu Angadi added a comment -

        minor modification to patch.

        Show
        Raghu Angadi added a comment - minor modification to patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 codes look good

        Show
        Tsz Wo Nicholas Sze added a comment - +1 codes look good
        Hide
        Raghu Angadi added a comment -

        Thanks Nicholas.

        Show
        Raghu Angadi added a comment - Thanks Nicholas.
        Hide
        Raghu Angadi added a comment -

        Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.

        Show
        Raghu Angadi added a comment - Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        The fixes some of the sporadically failing tests.

        Regd the core tests failed, I can't trace them to bug here but this might improve anyway.

        Show
        Raghu Angadi added a comment - The fixes some of the sporadically failing tests. Regd the core tests failed, I can't trace them to bug here but this might improve anyway.
        Hide
        Raghu Angadi added a comment -

        I just committed this.

        Show
        Raghu Angadi added a comment - I just committed this.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #426 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/426/ )

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Raghu Angadi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development