Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: io
    • Labels:
      None

      Description

      TestJobStatusPersistency failed and contained DataNode stacktraces similar to the following :

      2008-03-07 21:27:00,410 ERROR dfs.DataNode (DataNode.java:run(976)) - 127.0.0.1:57790:DataXceiver: java.net.SocketTimeoutException: 0 millis 
      timeout while waiting for Unknown Addr (local: /127.0.0.1:57790) to be ready for read
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:188)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:135)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:121)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
              at java.io.DataInputStream.readInt(DataInputStream.java:370)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2434)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1170)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:953)
              at java.lang.Thread.run(Thread.java:619)
      

      This is mostly related to HADOOP-2346. The error is strange. socket.getRemoteSocketAddress() returned null implying this socket is not connected yet. But we have already read a few bytes from it!.

      1. HADOOP-2971.patch
        3 kB
        Raghu Angadi
      2. HADOOP-2971.patch
        3 kB
        Raghu Angadi

        Activity

        Hide
        rangadi Raghu Angadi added a comment -

        I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.

        Show
        rangadi Raghu Angadi added a comment - I am willing to bet that for random reasons Java select() returns 0, irrespective of timeout. So we need to keep track how long we have waited. Oddly, when the test passes, there are no instances of these. But when the test fails, there are lot of instances of this.
        Hide
        rangadi Raghu Angadi added a comment - - edited

        I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck.

        The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.

        Show
        rangadi Raghu Angadi added a comment - - edited I thought I could avoid calling System.currentTimeMillis() while waiting and depend on select(). Tough luck. The attached patch polls in a loop until timeout passes. Also removes a large block for setting "channeStr". we use channel.toString() instead.
        Hide
        rangadi Raghu Angadi added a comment -

        minor modification to patch.

        Show
        rangadi Raghu Angadi added a comment - minor modification to patch.
        Hide
        szetszwo Tsz Wo Nicholas Sze added a comment -

        +1 codes look good

        Show
        szetszwo Tsz Wo Nicholas Sze added a comment - +1 codes look good
        Hide
        rangadi Raghu Angadi added a comment -

        Thanks Nicholas.

        Show
        rangadi Raghu Angadi added a comment - Thanks Nicholas.
        Hide
        rangadi Raghu Angadi added a comment -

        Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.

        Show
        rangadi Raghu Angadi added a comment - Still waiting for Hudson blessings. Please apply this patch if you are seeing random unit test failures before investigating further.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377535/HADOOP-2971.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1931/console This message is automatically generated.
        Hide
        rangadi Raghu Angadi added a comment -

        The fixes some of the sporadically failing tests.

        Regd the core tests failed, I can't trace them to bug here but this might improve anyway.

        Show
        rangadi Raghu Angadi added a comment - The fixes some of the sporadically failing tests. Regd the core tests failed, I can't trace them to bug here but this might improve anyway.
        Hide
        rangadi Raghu Angadi added a comment -

        I just committed this.

        Show
        rangadi Raghu Angadi added a comment - I just committed this.
        Hide
        hudson Hudson added a comment -
        Show
        hudson Hudson added a comment - Integrated in Hadoop-trunk #426 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/426/ )

          People

          • Assignee:
            rangadi Raghu Angadi
            Reporter:
            rangadi Raghu Angadi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development