Hadoop Common
  1. Hadoop Common
  2. HADOOP-2974

ipc unit tests fail due to connection errors

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: ipc
    • Labels:
      None
    • Environment:

      windows

      Description

      ipc unit tests fail due to connection errors:

      Failing tests:
      org.apache.hadoop.ipc.TestIPC.unknown
      org.apache.hadoop.ipc.TestIPCServerResponder.unknown
      org.apache.hadoop.ipc.TestRPC.testSlowRpc
      org.apache.hadoop.ipc.TestRPC.testCalls

      Changes:

      1. HADOOP-2346. Utilities to support timeout while writing to sockets. DFSClient and DataNode sockets have 10min write timeout.
      2. HADOOP-2906. Add an OutputFormat capable of using keys, values, and config params to map records to different output files.
      3. HADOOP-2756. NPE in DFSClient while closing DFSOutputStreams under load.
      4. HADOOP-2934. The namenode was encountreing a NPE while loading leases from the fsimage. Fixed.
      5. HADOOP-2925. Fix HOD to create mapred system directory using a naming convention that will avoid clashes in multi-user shared cluster scenario.
      6. HADOOP-2911. Make the information printed by the HOD allocate and info commands less verbose and clearer.
      7. HADOOP-2883. Write failures and data corruptions on HDFS files. The write timeout is back to what it was on 0.15 release. Also, the datnodes flushes the block file buffered output stream before sending a positive ack for the packet back to the client.
      8. HADOOP-2861. INCOMPATIBLE CHANGE. Improve the user interface for the HOD commands. Command line structure has changed.

      Error logs:
      [junit] Running org.apache.hadoop.ipc.TestIPC
      [junit] 2008-03-07 10:50:04,291 INFO metrics.RpcMetrics (RpcMetrics.java:<init>(53)) - Initializing RPC Metrics with hostName=0, port=4785
      [junit] 2008-03-07 10:50:04,354 INFO ipc.Server (Server.java:run(443)) - IPC Server Responder: starting
      [junit] 2008-03-07 10:50:04,369 INFO ipc.Server (Server.java:run(303)) - IPC Server listener on 4785: starting
      [junit] 2008-03-07 10:50:04,369 INFO ipc.Server (Server.java:run(861)) - IPC Server handler 0 on 4785: starting
      [junit] 2008-03-07 10:50:04,369 INFO ipc.Server (Server.java:run(861)) - IPC Server handler 1 on 4785: starting
      [junit] 2008-03-07 10:50:04,369 INFO ipc.Server (Server.java:run(861)) - IPC Server handler 2 on 4785: starting
      [junit] 2008-03-07 10:50:04,432 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 1 time(s).
      [junit] 2008-03-07 10:50:04,432 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 1 time(s).
      [junit] 2008-03-07 10:50:05,432 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 2 time(s).
      [junit] 2008-03-07 10:50:05,432 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 2 time(s).
      [junit] 2008-03-07 10:50:06,432 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 3 time(s).
      [junit] 2008-03-07 10:50:06,432 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 3 time(s).
      [junit] 2008-03-07 10:50:07,433 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 4 time(s).
      [junit] 2008-03-07 10:50:07,433 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 4 time(s).
      [junit] 2008-03-07 10:50:08,433 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 5 time(s).
      [junit] 2008-03-07 10:50:08,433 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 5 time(s).
      [junit] 2008-03-07 10:50:09,433 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 6 time(s).
      [junit] 2008-03-07 10:50:09,433 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 6 time(s).
      [junit] 2008-03-07 10:50:10,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 7 time(s).
      [junit] 2008-03-07 10:50:10,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 7 time(s).
      [junit] 2008-03-07 10:50:11,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 8 time(s).
      [junit] 2008-03-07 10:50:11,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 8 time(s).
      [junit] 2008-03-07 10:50:12,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 9 time(s).
      [junit] 2008-03-07 10:50:12,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 9 time(s).
      [junit] 2008-03-07 10:50:13,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 10 time(s).
      [junit] 2008-03-07 10:50:13,434 INFO ipc.Client (Client.java:setupIOstreams(177)) - Retrying connect to server: /0.0.0.0:4785. Already tried 10 time(s).
      [junit] 2008-03-07 10:50:14,435 FATAL ipc.TestIPC (TestIPC.java:run(92)) - Caught: java.net.BindException: Cannot assign requested address: no further information
      [junit] 2008-03-07 10:50:14,435 FATAL ipc.TestIPC (TestIPC.java:run(92)) - Caught: java.net.BindException: Cannot assign requested address: no further information

      1. HADOOP-2974.patch
        5 kB
        Raghu Angadi
      2. HADOOP-2974.patch
        4 kB
        Raghu Angadi

        Activity

        Mukund Madhugiri created issue -
        Raghu Angadi made changes -
        Field Original Value New Value
        Assignee Robert Chansler [ chansler ] Raghu Angadi [ rangadi ]
        Hide
        Raghu Angadi added a comment - - edited

        sigh. this fails with Java 1.5 but not with Java 1.6 on windows.

        Only change from HADOOP-2346 that matters is that Socket used in the following trace is create with "SocketChannel.open().socket()" instead of "new Socket()". So its an NIO socket. I don't know why it does not work. Any one know if NIO supposed to be stable in 1.5?

        exception :

        Cannot assign requested address: no further information
        java.net.BindException: Cannot assign requested address: no further information
                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
                at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:527)
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
                at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:161)
                at org.apache.hadoop.ipc.Client.getConnection(Client.java:578)
                at org.apache.hadoop.ipc.Client.call(Client.java:501)
                at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
                at $Proxy0.getProtocolVersion(Unknown Source)
                at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:291)
                at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:278)
                at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:315)
                at org.apache.hadoop.ipc.TestRPC.testSlowRpc(TestRPC.java:179)
        
        Show
        Raghu Angadi added a comment - - edited sigh. this fails with Java 1.5 but not with Java 1.6 on windows. Only change from HADOOP-2346 that matters is that Socket used in the following trace is create with "SocketChannel.open().socket()" instead of "new Socket()". So its an NIO socket. I don't know why it does not work. Any one know if NIO supposed to be stable in 1.5? exception : Cannot assign requested address: no further information java.net.BindException: Cannot assign requested address: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:527) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:161) at org.apache.hadoop.ipc.Client.getConnection(Client.java:578) at org.apache.hadoop.ipc.Client.call(Client.java:501) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at $Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:291) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:278) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:315) at org.apache.hadoop.ipc.TestRPC.testSlowRpc(TestRPC.java:179)
        Hide
        Raghu Angadi added a comment -

        given this jira and HADOOP-2971 , it might be better to revert HADOOP-2346 for now. Some of these might not be in our hands to fix. HADOOP-2346 is implemented based on JavaDoc for SelectableChannels stuff. I will see.

        Show
        Raghu Angadi added a comment - given this jira and HADOOP-2971 , it might be better to revert HADOOP-2346 for now. Some of these might not be in our hands to fix. HADOOP-2346 is implemented based on JavaDoc for SelectableChannels stuff. I will see.
        Hide
        Raghu Angadi added a comment -


        As a temporary work around, StandardSocketFactory.createSocket() could return 'new Socket()' (either on all platforms or on just windows). This implies client sockets will not have write timeout on these platforms. Any preferences?

        Show
        Raghu Angadi added a comment - As a temporary work around, StandardSocketFactory.createSocket() could return 'new Socket()' (either on all platforms or on just windows). This implies client sockets will not have write timeout on these platforms. Any preferences?
        Hide
        Raghu Angadi added a comment -

        The relevant Java bug seems to be http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6269047

        These tests essentially invoke connect("0.0.0.0:port") which does not work with channels before Java 1.6. We could change the address to 127.0.0.1 in the test, which seems to work.

        Show
        Raghu Angadi added a comment - The relevant Java bug seems to be http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6269047 These tests essentially invoke connect("0.0.0.0:port") which does not work with channels before Java 1.6. We could change the address to 127.0.0.1 in the test, which seems to work.
        Hide
        Raghu Angadi added a comment -

        Patch attached. This adds a static method NetUtils.getConnectAddress(Server). It returns "127.0.0.1:port" when server binds to "0.0.0.0:port". The failing tests now use this method instead of server.getListeningAddress().

        Note that NameNode, JobTracker also use server.getListeningAddress(), but it is not strictly correct. I didn't change that.

        Show
        Raghu Angadi added a comment - Patch attached. This adds a static method NetUtils.getConnectAddress(Server) . It returns "127.0.0.1:port" when server binds to "0.0.0.0:port". The failing tests now use this method instead of server.getListeningAddress(). Note that NameNode, JobTracker also use server.getListeningAddress(), but it is not strictly correct. I didn't change that.
        Raghu Angadi made changes -
        Attachment HADOOP-2974.patch [ 12377542 ]
        Hide
        Hairong Kuang added a comment -

        +1 on the patch. For the problem in JobTracker and NameNode, could you file a bug if it does not get solved in this jira?

        Show
        Hairong Kuang added a comment - +1 on the patch. For the problem in JobTracker and NameNode, could you file a bug if it does not get solved in this jira?
        Hide
        Raghu Angadi added a comment -

        Thanks Hairong. filed HADOOP-2989 for NameNode and JobTracker.

        Show
        Raghu Angadi added a comment - Thanks Hairong. filed HADOOP-2989 for NameNode and JobTracker.
        Raghu Angadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377542/HADOOP-2974.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 9 new or modified tests.

        patch -1. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1936/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377542/HADOOP-2974.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1936/console This message is automatically generated.
        Raghu Angadi made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Raghu Angadi added a comment -

        Fixed conflicts with trunk. I should have suspected this earlier.

        Show
        Raghu Angadi added a comment - Fixed conflicts with trunk. I should have suspected this earlier.
        Raghu Angadi made changes -
        Attachment HADOOP-2974.patch [ 12377623 ]
        Raghu Angadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377623/HADOOP-2974.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 9 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377623/HADOOP-2974.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1946/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        I just committed this.

        Show
        Raghu Angadi added a comment - I just committed this.
        Raghu Angadi made changes -
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #427 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/427/ )
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Mukund Madhugiri
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development