Hadoop Common
  1. Hadoop Common
  2. HADOOP-6528

Jetty returns -1 resulting in Hadoop masters / slaves to fail during startup.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A recent test failure on Hudson seems to indicate that Jetty's Server.getConnectors()[0].getLocalPort() is returning -1 in the HttpServer.getPort() method. When this happens, Hadoop masters / slaves that use Jetty fail to startup correctly.

        Issue Links

          Activity

          Hide
          Hemanth Yamijala added a comment -

          The said test failure can be seen here

          The test was starting a NameNode as part of a MiniDFSCluster. In the process of doing this, I see the following in the log file:

          2010-01-30 16:02:53,658 INFO http.HttpServer (HttpServer.java:start(442)) - Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 0
          2010-01-30 16:02:53,659 INFO http.HttpServer (HttpServer.java:start(447)) - listener.getLocalPort() returned 38815 webServer.getConnectors()[0].getLocalPort() returned 38815
          2010-01-30 16:02:53,659 INFO http.HttpServer (HttpServer.java:start(480)) - Jetty bound to port 38815

          However, the NameNode initialization failed as:

          java.lang.IllegalArgumentException: port out of range:-1
          	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:377)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.activate(NameNode.java:317)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:308)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:421)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:410)
          	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1261)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:278)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:127)
          	at org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapReduceTestCase.java:81)
          	at org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTestCase.java:56)
          

          The call in NameNode just before this Socket creation does HttpServer.getPort() that seems to be returning -1.

          A similar failure has been seen before in Map/Reduce, and was tracked in HADOOP-4744. After some struggle in trying to determine the root cause along with Jetty folks, I think we worked around in Map/Reduce by reinitializing the MR server whenever getPort returns -1. I think we need a more permanent fix, as this is now being seen in other daemons as well.

          Show
          Hemanth Yamijala added a comment - The said test failure can be seen here The test was starting a NameNode as part of a MiniDFSCluster. In the process of doing this, I see the following in the log file: 2010-01-30 16:02:53,658 INFO http.HttpServer (HttpServer.java:start(442)) - Port returned by webServer.getConnectors() [0] .getLocalPort() before open() is -1. Opening the listener on 0 2010-01-30 16:02:53,659 INFO http.HttpServer (HttpServer.java:start(447)) - listener.getLocalPort() returned 38815 webServer.getConnectors() [0] .getLocalPort() returned 38815 2010-01-30 16:02:53,659 INFO http.HttpServer (HttpServer.java:start(480)) - Jetty bound to port 38815 However, the NameNode initialization failed as: java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118) at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:377) at org.apache.hadoop.hdfs.server.namenode.NameNode.activate(NameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:308) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:421) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:410) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1261) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:278) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:127) at org.apache.hadoop.mapred.ClusterMapReduceTestCase.startCluster(ClusterMapReduceTestCase.java:81) at org.apache.hadoop.mapred.ClusterMapReduceTestCase.setUp(ClusterMapReduceTestCase.java:56) The call in NameNode just before this Socket creation does HttpServer.getPort() that seems to be returning -1. A similar failure has been seen before in Map/Reduce, and was tracked in HADOOP-4744 . After some struggle in trying to determine the root cause along with Jetty folks, I think we worked around in Map/Reduce by reinitializing the MR server whenever getPort returns -1. I think we need a more permanent fix, as this is now being seen in other daemons as well.
          Hide
          Hemanth Yamijala added a comment -

          Attaching Hudson's full log for reference.

          Show
          Hemanth Yamijala added a comment - Attaching Hudson's full log for reference.
          Hide
          Konstantin Boudnik added a comment -

          This problem has been attacked for the second time in HADOOP-6386 (and follow up HADOOP-6428).
          Now the code which produces it has two workarounds to guarantee that negative port can't happen at all. What happens however - and now we have a clear indication of it - is a race condition inside of the Jetty server. The port value is positive inside of NameNode.startHttpServer but is negative when
          listener.getLocalPort is called from outside of webserver context.

          Show
          Konstantin Boudnik added a comment - This problem has been attacked for the second time in HADOOP-6386 (and follow up HADOOP-6428 ). Now the code which produces it has two workarounds to guarantee that negative port can't happen at all . What happens however - and now we have a clear indication of it - is a race condition inside of the Jetty server. The port value is positive inside of NameNode.startHttpServer but is negative when listener.getLocalPort is called from outside of webserver context.
          Hide
          Hemanth Yamijala added a comment -

          The right fix seems to be in Jetty, right ? Should we request the Jetty folks to assist. I see there's an already open JIRA - JETTY-748 - talking about this.

          Show
          Hemanth Yamijala added a comment - The right fix seems to be in Jetty, right ? Should we request the Jetty folks to assist. I see there's an already open JIRA - JETTY-748 - talking about this.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          We are hitting this a lot in the recent Hudson tests. We need to do something here.. Am going to comment on JETTY-748.

          Show
          Vinod Kumar Vavilapalli added a comment - We are hitting this a lot in the recent Hudson tests. We need to do something here.. Am going to comment on JETTY-748.
          Hide
          Greg Wilkins added a comment -

          Hi,

          we've been looking at the jetty code again and again and can't see how a volatile or synchronized block or any memory barrier could help - unless it is an issue within the JVM libraries themselves.

          We've been running stop/start tests but also cannot reproduce.

          It is possible that something else in hadoop is stopping the server before the call to get the port?

          Can we get full details of the platforms on which this fails - operating system, JVM version etc. etc.

          Show
          Greg Wilkins added a comment - Hi, we've been looking at the jetty code again and again and can't see how a volatile or synchronized block or any memory barrier could help - unless it is an issue within the JVM libraries themselves. We've been running stop/start tests but also cannot reproduce. It is possible that something else in hadoop is stopping the server before the call to get the port? Can we get full details of the platforms on which this fails - operating system, JVM version etc. etc.
          Hide
          Greg Wilkins added a comment -

          I'm still pondering this one... and still can't think of an explanation other than something in hadoop is calling close.

          I've attached a patch to http://jira.codehaus.org/browse/JETTY-748 that changes the way localPort is handled:
          Instead of calling socket.getLocalPort() when connector.getLocalPort() is called, this patch calls socket.getLocalPort() in the open method and holds the result in a volatile int. If close is called, the int is set to -2, so this will tell us if the port has not been opened, or has been closed before it was used.

          This patch is not yet in a release, but if somebody has a setup where this problem occurs, then we can produce a snapshot build for you to test against.

          Show
          Greg Wilkins added a comment - I'm still pondering this one... and still can't think of an explanation other than something in hadoop is calling close. I've attached a patch to http://jira.codehaus.org/browse/JETTY-748 that changes the way localPort is handled: Instead of calling socket.getLocalPort() when connector.getLocalPort() is called, this patch calls socket.getLocalPort() in the open method and holds the result in a volatile int. If close is called, the int is set to -2, so this will tell us if the port has not been opened, or has been closed before it was used. This patch is not yet in a release, but if somebody has a setup where this problem occurs, then we can produce a snapshot build for you to test against.
          Hide
          Allen Wittenauer added a comment -

          Hello from the future!

          At this point, I'm assuming this issue has been fixed in one way or another. I know that as of this writing, parts of Hadoop are using jetty 6.1.26 and other parts are using netty 3.6.2. Additionally, JRE 6 has been EOLed by Oracle.

          I think I'll close this out as Not A Problem, because as far as I'm aware, this is no longer an active issue for Hadoop.

          Show
          Allen Wittenauer added a comment - Hello from the future! At this point, I'm assuming this issue has been fixed in one way or another. I know that as of this writing, parts of Hadoop are using jetty 6.1.26 and other parts are using netty 3.6.2. Additionally, JRE 6 has been EOLed by Oracle. I think I'll close this out as Not A Problem, because as far as I'm aware, this is no longer an active issue for Hadoop.

            People

            • Assignee:
              Unassigned
              Reporter:
              Hemanth Yamijala
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development