Hadoop Common
  1. Hadoop Common
  2. HADOOP-6386

NameNode's HttpServer can't instantiate InetSocketAddress: IllegalArgumentException is thrown

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.2, 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Apache Hudson build machine

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Improved initialization sequence so that Port Out of Range error when starting web server will less likely interrupt testing.
    • Tags:
      regression

      Description

      In an execution of a tests the following exception has been thrown:

      Error Message
      
      port out of range:-1
      
      Stacktrace
      java.lang.IllegalArgumentException: port out of range:-1
      	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:371)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.activate(NameNode.java:313)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:304)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:410)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:404)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1211)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:287)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:131)
      	at org.apache.hadoop.hdfs.server.namenode.TestEditLog.testEditLog(TestEditLog.java:92)
      
      1. testEditLog.html
        20 kB
        Konstantin Boudnik
      2. hdfs-771.patch
        2 kB
        Konstantin Boudnik
      3. hdfs-771.patch
        1 kB
        Konstantin Boudnik
      4. HADOOP-6386-0.20.patch
        2 kB
        Konstantin Boudnik
      5. HADOOP-6386-0.20.patch
        2 kB
        Konstantin Boudnik
      6. HADOOP-6386-0.20.patch
        2 kB
        Konstantin Boudnik
      7. HADOOP-6386.patch
        2 kB
        Konstantin Boudnik
      8. HADOOP-6386.patch
        2 kB
        Konstantin Boudnik
      9. HADOOP-6386.patch
        4 kB
        Konstantin Boudnik
      10. HADOOP-6386.patch
        2 kB
        Konstantin Boudnik
      11. HADOOP-6386.patch
        2 kB
        Konstantin Boudnik

        Issue Links

          Activity

          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Konstantin Boudnik made changes -
          Link This issue relates to HADOOP-6760 [ HADOOP-6760 ]
          Tom White made changes -
          Fix Version/s 0.20.2 [ 12314203 ]
          Fix Version/s 0.22.0 [ 12314296 ]
          Konstantin Boudnik made changes -
          Link This issue is related to HADOOP-6528 [ HADOOP-6528 ]
          Konstantin Boudnik made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Boudnik added a comment -

          The fix has been delivered via HADOOP-6428

          Show
          Konstantin Boudnik added a comment - The fix has been delivered via HADOOP-6428
          Robert Chansler made changes -
          Release Note Port Out of Range error when starting web server interrupts testing. Improved initialization sequence so that Port Out of Range error when starting web server will less likely interrupt testing.
          Robert Chansler made changes -
          Release Note Port Out of Range error when starting web server interrupts testing.
          Konstantin Boudnik made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Konstantin Boudnik added a comment -

          I keep seeing this problem surfacing here and again.

          Looks like the offered workaround hasn't worked (as well as HADOOP-4744 didn't have a chance) because webServer.getConnectors()[0].getLocalPort() returns positive value while the control is inside of HttpServer object. However, it returns -1 when is inqueried from outside.

          Show
          Konstantin Boudnik added a comment - I keep seeing this problem surfacing here and again. Looks like the offered workaround hasn't worked (as well as HADOOP-4744 didn't have a chance) because webServer.getConnectors() [0] .getLocalPort() returns positive value while the control is inside of HttpServer object. However, it returns -1 when is inqueried from outside.
          Konstantin Boudnik made changes -
          Link This issue is depended upon by HDFS-834 [ HDFS-834 ]
          Konstantin Boudnik made changes -
          Link This issue is duplicated by HDFS-834 [ HDFS-834 ]
          gary murry made changes -
          Link This issue is duplicated by HDFS-834 [ HDFS-834 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HADOOP-6428 [ HADOOP-6428 ]
          Konstantin Boudnik made changes -
          Fix Version/s 0.20.2 [ 12314203 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #182 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/182/)
          . NameNode's HttpServer can't instantiate InetSocketAddress: IllegalArgumentException is thrown. Contributed by Konstantin Boudnik.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #182 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/182/ ) . NameNode's HttpServer can't instantiate InetSocketAddress: IllegalArgumentException is thrown. Contributed by Konstantin Boudnik.
          Konstantin Boudnik made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.21.0 [ 12313563 ]
          Fix Version/s 0.22.0 [ 12314296 ]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Boudnik added a comment -

          I've just committed it to the trunk and branch-0.21.

          Show
          Konstantin Boudnik added a comment - I've just committed it to the trunk and branch-0.21.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #100 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/100/)
          . NameNode's HttpServer can't instantiate InetSocketAddress: IllegalArgumentException is thrown. Contributed by Konstantin Boudnik.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #100 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/100/ ) . NameNode's HttpServer can't instantiate InetSocketAddress: IllegalArgumentException is thrown. Contributed by Konstantin Boudnik.
          Hide
          Konstantin Boudnik added a comment -

          Also, here's the .zip file of source code of Jetty 6.1.14 (currently used by Hadoop) available for download.

          Show
          Konstantin Boudnik added a comment - Also, here's the .zip file of source code of Jetty 6.1.14 (currently used by Hadoop) available for download.
          Hide
          Konstantin Shvachko added a comment -

          This patch is safe to commit and try, IMO. It does not make things worse, aothough it is not clear whether it fixes the problem. Current intuition is that there is something wrong with Jetty. We should look more closely in their code. BTW, the sources for Jetty 6.1 are here
          http://jetty.mortbay.org/jetty/jetty-6/xref/index.html
          and they look pretty different from Jetty 5.
          Since this is critical for PIG I'd recommend committing the workaround for now until the problem is fully investigated.

          Show
          Konstantin Shvachko added a comment - This patch is safe to commit and try, IMO. It does not make things worse, aothough it is not clear whether it fixes the problem. Current intuition is that there is something wrong with Jetty. We should look more closely in their code. BTW, the sources for Jetty 6.1 are here http://jetty.mortbay.org/jetty/jetty-6/xref/index.html and they look pretty different from Jetty 5. Since this is critical for PIG I'd recommend committing the workaround for now until the problem is fully investigated.
          Konstantin Boudnik made changes -
          Attachment HADOOP-6386-0.20.patch [ 12427282 ]
          Attachment HADOOP-6386.patch [ 12427283 ]
          Hide
          Konstantin Boudnik added a comment -

          Adding logging statements per Devaraj's comment.

          Show
          Konstantin Boudnik added a comment - Adding logging statements per Devaraj's comment.
          Hide
          Devaraj Das added a comment -

          You may want to put some LOG.warn statements at the places where the port number happens to be unexpected (port < 0). With a hope that this takes care of the race condition, i am okay with the last two patches. If i get some other idea on handling it better i will put a comment here.

          Show
          Devaraj Das added a comment - You may want to put some LOG.warn statements at the places where the port number happens to be unexpected (port < 0). With a hope that this takes care of the race condition, i am okay with the last two patches. If i get some other idea on handling it better i will put a comment here.
          Konstantin Boudnik made changes -
          Attachment HADOOP-6386.patch [ 12427274 ]
          Hide
          Konstantin Boudnik added a comment -

          Addressing Konstantin's comment about un-welcomed collapse of imports list.

          Show
          Konstantin Boudnik added a comment - Addressing Konstantin's comment about un-welcomed collapse of imports list.
          Konstantin Boudnik made changes -
          Attachment HADOOP-6386-0.20.patch [ 12427270 ]
          Attachment HADOOP-6386.patch [ 12427271 ]
          Hide
          Konstantin Boudnik added a comment -

          New version of the patch - there's no need to start/stop listeners, because it is done by the server start/stop methods.

          Show
          Konstantin Boudnik added a comment - New version of the patch - there's no need to start/stop listeners, because it is done by the server start/stop methods.
          Konstantin Boudnik made changes -
          Attachment HADOOP-6386.patch [ 12427262 ]
          Hide
          Konstantin Boudnik added a comment -

          Fixing an issue with path in the patch.

          Show
          Konstantin Boudnik added a comment - Fixing an issue with path in the patch.
          Hide
          Konstantin Boudnik added a comment -

          Not exactly. According to HADOOP-4744 Jetty folks rejected to believe that this is Jetty's problem. As no one was able to provide reproducible enough test case Hadoop's fellas had to workaround it back then. It seems that we need to workaround it again

          Show
          Konstantin Boudnik added a comment - Not exactly. According to HADOOP-4744 Jetty folks rejected to believe that this is Jetty's problem. As no one was able to provide reproducible enough test case Hadoop's fellas had to workaround it back then. It seems that we need to workaround it again
          Show
          Todd Lipcon added a comment - Another example of this test failure: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/166/testReport/org.apache.hadoop.mapred/TestMiniMRWithDFS/testWithDFSWithDefaultPort/ Do we have a Jetty JIRA open for this?
          Hide
          Konstantin Boudnik added a comment -

          You've brought up an interesting issue. According to the latest Jetty source code
          Container.java first calls polymorphic doStart() and then sets _started flag to true. I.e.

                      doStart();
                      _started=true;
          

          Now, HttpServer's doStart() method JavaDoc states that

              /** Start all handlers then listeners.
               * If a subcomponent fails to start, it's exception is added to a
               * org.mortbay.util.MultiException and the start method continues.
          

          So, strictly speaking this isn't an asynchronous method. Besides, if something would go wrong we are likely to see an MultiException thrown by this 'failing' server instance. However, we don't see any. Which brings me to the conclusion that server seems to start properly and isStarted() is going to return true in this case.

          There's still a chance that we are dealing with a race condition in the Jetty server, but it isn't apparent what's causing it ;-(

          Show
          Konstantin Boudnik added a comment - You've brought up an interesting issue. According to the latest Jetty source code Container.java first calls polymorphic doStart() and then sets _started flag to true . I.e. doStart(); _started=true; Now, HttpServer 's doStart() method JavaDoc states that /** Start all handlers then listeners. * If a subcomponent fails to start, it's exception is added to a * org.mortbay.util.MultiException and the start method continues. So, strictly speaking this isn't an asynchronous method. Besides, if something would go wrong we are likely to see an MultiException thrown by this 'failing' server instance. However, we don't see any. Which brings me to the conclusion that server seems to start properly and isStarted() is going to return true in this case. There's still a chance that we are dealing with a race condition in the Jetty server, but it isn't apparent what's causing it ;-(
          Hide
          Konstantin Shvachko added a comment -

          My understanding was that getPort() is called before the server actually started, because HttpServer.start() is asynchronous. Therefore I believe isStarted() should help. Do you think it's a different problem?

          Show
          Konstantin Shvachko added a comment - My understanding was that getPort() is called before the server actually started, because HttpServer.start() is asynchronous. Therefore I believe isStarted() should help. Do you think it's a different problem?
          Hide
          Konstantin Boudnik added a comment -

          Said port number is positive before the start of the server. However, after the supposed start the local port number turns to be -1 which causes the very error we're seeing.
          I'm also not sure if checking that the component in question has been started will be indicative.

          Show
          Konstantin Boudnik added a comment - Said port number is positive before the start of the server. However, after the supposed start the local port number turns to be -1 which causes the very error we're seeing. I'm also not sure if checking that the component in question has been started will be indicative.
          Hide
          Konstantin Shvachko added a comment -

          Not sure this is going to help. From the printout we know that the port you are checking in the if statement is not -1, so the workaround code is not called. If I don't miss something.
          I guess we can check webServer.isStarted() before returning from HttpServer.start().

          Show
          Konstantin Shvachko added a comment - Not sure this is going to help. From the printout we know that the port you are checking in the if statement is not -1, so the workaround code is not called. If I don't miss something. I guess we can check webServer.isStarted() before returning from HttpServer.start() .
          Konstantin Boudnik made changes -
          Attachment HADOOP-6386-0.20.patch [ 12426124 ]
          Hide
          Konstantin Boudnik added a comment -

          Same for 0.20

          Show
          Konstantin Boudnik added a comment - Same for 0.20
          Konstantin Boudnik made changes -
          Attachment HADOOP-6386.patch [ 12426123 ]
          Hide
          Konstantin Boudnik added a comment -

          Slightly more intelligent version of the patch.

          Show
          Konstantin Boudnik added a comment - Slightly more intelligent version of the patch.
          Konstantin Boudnik made changes -
          Project Hadoop HDFS [ 12310942 ] Hadoop Common [ 12310240 ]
          Key HDFS-771 HADOOP-6386
          Affects Version/s 0.20.2 [ 12314203 ]
          Affects Version/s 0.21.0 [ 12313563 ]
          Affects Version/s 0.22.0 [ 12314296 ]
          Affects Version/s 0.22.0 [ 12314241 ]
          Component/s name-node [ 12312926 ]
          Konstantin Boudnik made changes -
          Attachment hdfs-771.patch [ 12425920 ]
          Hide
          Konstantin Boudnik added a comment -

          Patch for 0.20

          Show
          Konstantin Boudnik added a comment - Patch for 0.20
          Hide
          Konstantin Boudnik added a comment -

          I've ran few hundreds iterations of bringing up and shutting down the MiniDFSCluster with and without this patch in place and haven't see a single issue. Which doesn't prove though that the patch doesn't work

          Show
          Konstantin Boudnik added a comment - I've ran few hundreds iterations of bringing up and shutting down the MiniDFSCluster with and without this patch in place and haven't see a single issue. Which doesn't prove though that the patch doesn't work
          Konstantin Boudnik made changes -
          Attachment hdfs-771.patch [ 12425677 ]
          Hide
          Konstantin Boudnik added a comment -

          Strictly speaking it should be responsibility of HttpServer to make guarantees that webserver is up and running and declared ports are actually bound.

          Thus, here's the patch which hopefully will reduce the number of seeing errors like this one. The patch doesn't solve the underlying problem which seems to be rather a result of a race condition in jetty. However, it should do the job.

          Show
          Konstantin Boudnik added a comment - Strictly speaking it should be responsibility of HttpServer to make guarantees that webserver is up and running and declared ports are actually bound. Thus, here's the patch which hopefully will reduce the number of seeing errors like this one. The patch doesn't solve the underlying problem which seems to be rather a result of a race condition in jetty. However, it should do the job.
          Konstantin Boudnik made changes -
          Assignee Konstantin Boudnik [ cos ]
          Hide
          Konstantin Boudnik added a comment -

          After all it seems to be a race condition in the Jetty, e.g. (NameNode:367)

              this.httpServer.start();
          

          Appropriate log file

          2009-11-13 07:02:04,605 INFO  http.HttpServer (HttpServer.java:start(432)) - Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 0
          2009-11-13 07:02:04,606 INFO  http.HttpServer (HttpServer.java:start(437)) - listener.getLocalPort() returned 37817 webServer.getConnectors()[0].getLocalPort() returned 37817
          2009-11-13 07:02:04,607 INFO  http.HttpServer (HttpServer.java:start(470)) - Jetty bound to port 37817
          2009-11-13 07:02:04,607 INFO  mortbay.log (?:invoke0(?)) - jetty-6.1.14
          2009-11-13 07:03:04,231 INFO  mortbay.log (?:invoke0(?)) - Started SelectChannelConnector@localhost:37817
          

          And the this code is executed (NameNode:370-371)

              // The web-server port can be ephemeral... ensure we have the correct info
              infoPort = this.httpServer.getPort();
              this.httpAddress = new InetSocketAddress(infoHost, infoPort);
          

          and this.httpServer.getPort(); returns -1 as the infoPort value

          I'll try to work out a minimal test case to reproduce this problem, however it might be hard.

          Show
          Konstantin Boudnik added a comment - After all it seems to be a race condition in the Jetty, e.g. (NameNode:367) this.httpServer.start(); Appropriate log file 2009-11-13 07:02:04,605 INFO http.HttpServer (HttpServer.java:start(432)) - Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 0 2009-11-13 07:02:04,606 INFO http.HttpServer (HttpServer.java:start(437)) - listener.getLocalPort() returned 37817 webServer.getConnectors()[0].getLocalPort() returned 37817 2009-11-13 07:02:04,607 INFO http.HttpServer (HttpServer.java:start(470)) - Jetty bound to port 37817 2009-11-13 07:02:04,607 INFO mortbay.log (?:invoke0(?)) - jetty-6.1.14 2009-11-13 07:03:04,231 INFO mortbay.log (?:invoke0(?)) - Started SelectChannelConnector@localhost:37817 And the this code is executed (NameNode:370-371) // The web-server port can be ephemeral... ensure we have the correct info infoPort = this.httpServer.getPort(); this.httpAddress = new InetSocketAddress(infoHost, infoPort); and this.httpServer.getPort(); returns -1 as the infoPort value I'll try to work out a minimal test case to reproduce this problem, however it might be hard.
          Konstantin Boudnik made changes -
          Link This issue relates to HADOOP-4744 [ HADOOP-4744 ]
          Konstantin Boudnik made changes -
          Summary Jetty crashes: MiniDFSCluster supplies incorrect port number to the NameNode NameNode's HttpServer can't instantiate InetSocketAddress: IllegalArgumentException is thrown
          Tags regression
          Priority Major [ 3 ] Blocker [ 1 ]
          Component/s name-node [ 12312926 ]
          Component/s test [ 12312916 ]
          Hide
          Konstantin Boudnik added a comment -

          This issue seems to be a regression (or rather the result of incomplete fix of HADOOP-4744)
          The problem is surfacing quite often in Pig (a few times per week). So, I'm raising the priority to the Blocker, cause all components are affected by this issue.

          Show
          Konstantin Boudnik added a comment - This issue seems to be a regression (or rather the result of incomplete fix of HADOOP-4744 ) The problem is surfacing quite often in Pig (a few times per week). So, I'm raising the priority to the Blocker, cause all components are affected by this issue.
          Konstantin Boudnik made changes -
          Link This issue relates to MAPREDUCE-1179 [ MAPREDUCE-1179 ]
          Hide
          Konstantin Boudnik added a comment -

          The environment and all are available from Hudson build

          Show
          Konstantin Boudnik added a comment - The environment and all are available from Hudson build
          Konstantin Boudnik made changes -
          Field Original Value New Value
          Attachment testEditLog.html [ 12424857 ]
          Hide
          Konstantin Boudnik added a comment -

          Full log of the test execution

          Show
          Konstantin Boudnik added a comment - Full log of the test execution
          Konstantin Boudnik created issue -

            People

            • Assignee:
              Konstantin Boudnik
              Reporter:
              Konstantin Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development