Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4251

TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Trace

      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:9030] java.net.BindException: Address already in use: bind; For more details see:  http://wiki.apache.org/hadoop/BindException
      	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
      	at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
      	at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
      	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
      	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Unknown Source)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
      	at org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
      	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      	at org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      	at java.lang.reflect.Method.invoke(Unknown Source)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] java.net.BindException: Address already in use: bind; For more details see:  http://wiki.apache.org/hadoop/BindException
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
      	at java.lang.reflect.Constructor.newInstance(Unknown Source)
      	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
      	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
      	at org.apache.hadoop.ipc.Server.bind(Server.java:486)
      	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:646)
      	at org.apache.hadoop.ipc.Server.<init>(Server.java:2399)
      	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:946)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:537)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
      	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
      	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
      	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
      	... 27 more
      Caused by: java.net.BindException: Address already in use: bind
      	at sun.nio.ch.Net.bind0(Native Method)
      	at sun.nio.ch.Net.bind(Unknown Source)
      	at sun.nio.ch.Net.bind(Unknown Source)
      	at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
      	at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
      	at org.apache.hadoop.ipc.Server.bind(Server.java:469)
      	... 35 more
      
      
      1. YARN-4251.patch
        2 kB
        Brahma Reddy Battula

        Issue Links

          Activity

          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          scheduler port is hardcoded.Uploaded the patch to take random port..

          conf.set(YarnConfiguration.RM_SCHEDULER_ADDRESS, "0.0.0.0:9030");

          Kindly review..thanks..

          Show
          brahmareddy Brahma Reddy Battula added a comment - scheduler port is hardcoded.Uploaded the patch to take random port.. conf.set(YarnConfiguration.RM_SCHEDULER_ADDRESS, "0.0.0.0:9030" ); Kindly review..thanks..
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 6m 18s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 58s There were no new javac warning messages.
          -1 release audit 0m 17s The applied patch generated 1 release audit warnings.
          +1 checkstyle 0m 26s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 0m 52s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 7m 3s Tests failed in hadoop-yarn-client.
              24m 59s  



          Reason Tests
          Failed unit tests hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12765995/YARN-4251.patch
          Optional Tests javac unit findbugs checkstyle
          git revision trunk / 7e2c971
          Release Audit https://builds.apache.org/job/PreCommit-YARN-Build/9399/artifact/patchprocess/patchReleaseAuditProblems.txt
          hadoop-yarn-client test log https://builds.apache.org/job/PreCommit-YARN-Build/9399/artifact/patchprocess/testrun_hadoop-yarn-client.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9399/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9399/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 6m 18s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 58s There were no new javac warning messages. -1 release audit 0m 17s The applied patch generated 1 release audit warnings. +1 checkstyle 0m 26s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 52s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 7m 3s Tests failed in hadoop-yarn-client.     24m 59s   Reason Tests Failed unit tests hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765995/YARN-4251.patch Optional Tests javac unit findbugs checkstyle git revision trunk / 7e2c971 Release Audit https://builds.apache.org/job/PreCommit-YARN-Build/9399/artifact/patchprocess/patchReleaseAuditProblems.txt hadoop-yarn-client test log https://builds.apache.org/job/PreCommit-YARN-Build/9399/artifact/patchprocess/testrun_hadoop-yarn-client.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9399/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9399/console This message was automatically generated.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Testcase failure and release audit warning is unrelated..Even I raised YARN-4250 for testcase failure..Kindly Review..

          Show
          brahmareddy Brahma Reddy Battula added a comment - Testcase failure and release audit warning is unrelated..Even I raised YARN-4250 for testcase failure..Kindly Review..
          Hide
          stevel@apache.org Steve Loughran added a comment -

          git history implies this went in with YARN-1366

          Show
          stevel@apache.org Steve Loughran added a comment - git history implies this went in with YARN-1366
          Hide
          appodictic Edward Capriolo added a comment -

          This fix does not help. I still get the same message binding to 0.0.0.0:0. With whatever gets bunded with cloudera 5.4.2

          at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)

          at sun.nio.ch.Net.bind(Net.java:436)
          	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
          	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
          	at org.apache.hadoop.ipc.Server.bind(Server.java:407)
          	... 19 more
          2015-10-04 19:31:10,567 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:0] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
          org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:0] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
          	at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
          	at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
          	at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:119)
          	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
          	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1084)
          	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
          	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:415)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
          	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
          	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
          Caused by: java.net.BindException: Problem binding to [0.0.0.0:0] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
          

          Can we re-open?

          Show
          appodictic Edward Capriolo added a comment - This fix does not help. I still get the same message binding to 0.0.0.0:0. With whatever gets bunded with cloudera 5.4.2 at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.ipc.Server.bind(Server.java:407) ... 19 more 2015-10-04 19:31:10,567 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:0] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:0] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:119) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1084) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429) Caused by: java.net.BindException: Problem binding to [0.0.0.0:0] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException Can we re-open?
          Hide
          appodictic Edward Capriolo added a comment -

          Does ServerSocketUtil.getPort(45020, 10)); mean that we are using a range of 10 ports? That does not seem like many concurrent jobs?

          Show
          appodictic Edward Capriolo added a comment - Does ServerSocketUtil.getPort(45020, 10)); mean that we are using a range of 10 ports? That does not seem like many concurrent jobs?
          Hide
          appodictic Edward Capriolo added a comment -

          To be clear I sometimes get that see the above error when running multiple concurrent map/reduce processes.

          Show
          appodictic Edward Capriolo added a comment - To be clear I sometimes get that see the above error when running multiple concurrent map/reduce processes.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Edward Capriolo do you mean, you applied this patch and still you are getting bind exception..? can you provide the logs such that I can look into your logs like why same port is got allocated..?

          Does ServerSocketUtil.getPort(45020, 10)); mean that we are using a range of 10 ports? That does not seem like many concurrent jobs?

          no, it will check whether 45020 port is free or not, If not free,it will retry 10 times(while retrying, try to get random free port)...

          Show
          brahmareddy Brahma Reddy Battula added a comment - Edward Capriolo do you mean, you applied this patch and still you are getting bind exception..? can you provide the logs such that I can look into your logs like why same port is got allocated..? Does ServerSocketUtil.getPort(45020, 10)); mean that we are using a range of 10 ports? That does not seem like many concurrent jobs? no, it will check whether 45020 port is free or not, If not free,it will retry 10 times(while retrying, try to get random free port)...
          Hide
          appodictic Edward Capriolo added a comment -

          I can see now that this patch is ONLY for a test case. So I am having the problem in production. I get this problem binding to 0.0.0,0:0.

          Looking around I have found, which I think to be the related property:

          <property>
          <name>yarn.app.mapreduce.am.job.client.port-range</name>
          <value></value>
          <description>Range of ports that the MapReduce AM can use when binding.
          Leave blank if you want all possible ports.
          For example 50000-50050,50100-50200</description>
          </property>

          As far as I can tell it is set blank. so I do not understand why if has port conflicts. That should mean it can use ANY port.

          Show
          appodictic Edward Capriolo added a comment - I can see now that this patch is ONLY for a test case. So I am having the problem in production. I get this problem binding to 0.0.0,0:0. Looking around I have found, which I think to be the related property: <property> <name>yarn.app.mapreduce.am.job.client.port-range</name> <value></value> <description>Range of ports that the MapReduce AM can use when binding. Leave blank if you want all possible ports. For example 50000-50050,50100-50200</description> </property> As far as I can tell it is set blank. so I do not understand why if has port conflicts. That should mean it can use ANY port.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Edward Capriolo As it's not related to this jira , Please post this query in user mailing list with logs,it can be analysed further.

          Show
          brahmareddy Brahma Reddy Battula added a comment - Edward Capriolo As it's not related to this jira , Please post this query in user mailing list with logs,it can be analysed further.
          Hide
          appodictic Edward Capriolo added a comment -

          I did post this to the ML, and CDH user lists. I'm digging deeper because no one there answers. I don't know how any version of YARN could ship where launching 2-3 jobs at once causes prod issues without someone noticing it.

          Show
          appodictic Edward Capriolo added a comment - I did post this to the ML, and CDH user lists. I'm digging deeper because no one there answers. I don't know how any version of YARN could ship where launching 2-3 jobs at once causes prod issues without someone noticing it.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Edward; not related to this JIRA. You can open anotherone saying "cannot bind [0.0.0.0:0] java.net.BindException: Address already in use;"

          but if you do that, I'm going to say "have you followed the wiki link? Did you follow the steps to diagnose?". So do that first, OK?

          Show
          stevel@apache.org Steve Loughran added a comment - Edward; not related to this JIRA. You can open anotherone saying "cannot bind [0.0.0.0:0] java.net.BindException: Address already in use;" but if you do that, I'm going to say "have you followed the wiki link? Did you follow the steps to diagnose?". So do that first, OK?
          Hide
          appodictic Edward Capriolo added a comment -

          Yes. I followed the wiki link. I know what a BindException is and how to troubleshoot it.

          Possible Causes

          The port is in use (likeliest)

          Sure is but only hadoop runs on this cluster. Port 0 is not a port, and if it is supposed to pick dynamic ports I doubt they are ALL in use.

          If the port number is below 1024, the OS may be preventing your program from binding to a "trusted port"

          Why would hadoop by default bind to a trusted port?

          If the configuration is a hostname:port value, it may be that the hostname is wrong -or its IP address isn't one your machine has.

          Pretty sure 0.0.0.0 is not a wrong ip address

          There is an instance of the service already running.

          So something is running on PORT 0? Seems unlikely. Instance of what? Something that is trying to launch itself every job? I don't know?

          Also the dismissive nature of the wiki:
          "Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry"
          Everything worked fine one day. I upgrade hadoop it stops working. The wiki ends with a bold claim that every bind exception that starts the day after upgrade is not a hadoop problem.

          Show
          appodictic Edward Capriolo added a comment - Yes. I followed the wiki link. I know what a BindException is and how to troubleshoot it. Possible Causes The port is in use (likeliest) Sure is but only hadoop runs on this cluster. Port 0 is not a port, and if it is supposed to pick dynamic ports I doubt they are ALL in use. If the port number is below 1024, the OS may be preventing your program from binding to a "trusted port" Why would hadoop by default bind to a trusted port? If the configuration is a hostname:port value, it may be that the hostname is wrong -or its IP address isn't one your machine has. Pretty sure 0.0.0.0 is not a wrong ip address There is an instance of the service already running. So something is running on PORT 0? Seems unlikely. Instance of what? Something that is trying to launch itself every job? I don't know? Also the dismissive nature of the wiki: "Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry" Everything worked fine one day. I upgrade hadoop it stops working. The wiki ends with a bold claim that every bind exception that starts the day after upgrade is not a hadoop problem.
          Hide
          varun_saxena Varun Saxena added a comment -

          Edward Capriolo,
          Well, MRClientService will bind to all the IP addresses on your machine i.e. 0.0.0.0 and will use port 0 if you do not specify one in the config above.
          Hadoop code merely uses JAVA API to bind. So it is likely that all your ports are occupied for some reason.
          This is not related to this JIRA hence as Brahma said kindly send a mail to the user mailing list so that some people can suggest what you can look at. There are commands which can tell you which process is occupying the ports.

          Show
          varun_saxena Varun Saxena added a comment - Edward Capriolo , Well, MRClientService will bind to all the IP addresses on your machine i.e. 0.0.0.0 and will use port 0 if you do not specify one in the config above. Hadoop code merely uses JAVA API to bind. So it is likely that all your ports are occupied for some reason. This is not related to this JIRA hence as Brahma said kindly send a mail to the user mailing list so that some people can suggest what you can look at. There are commands which can tell you which process is occupying the ports.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Port 0, is anywhere, so I doubt that's the problem. addr 0:0:0:0 is, "all hosts", isn't it? So there's something wrong there.

          Like we say, please open a new JIRA for what is clearly a new issue

          Show
          stevel@apache.org Steve Loughran added a comment - Port 0, is anywhere, so I doubt that's the problem. addr 0:0:0:0 is, "all hosts", isn't it? So there's something wrong there. Like we say, please open a new JIRA for what is clearly a new issue
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Also the dismissive nature of the wiki:

          "Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry"

          Everything worked fine one day. I upgrade hadoop it stops working. The wiki ends with a bold claim that every bind exception that starts the day after upgrade is not a hadoop problem.

          Edward, I an assure you that most of the JIRAs we get related to: ConnectionRefused, BindException, NoRouteToHostException,...etc are related to system configs. it is almost invariably some machine config issue, be it ubuntu mapping localhost to 127.0.1.1; a firewall in the way, rDNS broken, or tothers. And we get so many complaining that the namenode is refusing connections, when either the firewall is up, the port settings for the client are wrong, the hostname is wrong or the NN isn't up. Same for BindException.

          We've gone to the effort of adding wrappers around all socket exceptions to add in hostnames and ports (the things people who understand networking need), and wiki entries to help people fend for themselves and not file Critical issues about problems that they generally have to fix for themselves. Yet even with those exceptions saying "look at the wiki" entry, we still get people not following the link, but going straight to JIRA: HADOOP-12391.

          if you look at the history of those wiki entries, you can see that they continually grow as we find new system setup issues which trigger the exception. That's because I do hit problems, I do fix them myself, and whenever I do that, I add another line. If you've found a new way, once fixed, I encourage you add a new entry. And, at the same time, you are free to change that text at the end.

          Show
          stevel@apache.org Steve Loughran added a comment - Also the dismissive nature of the wiki: "Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry" Everything worked fine one day. I upgrade hadoop it stops working. The wiki ends with a bold claim that every bind exception that starts the day after upgrade is not a hadoop problem. Edward, I an assure you that most of the JIRAs we get related to: ConnectionRefused, BindException, NoRouteToHostException,...etc are related to system configs. it is almost invariably some machine config issue, be it ubuntu mapping localhost to 127.0.1.1; a firewall in the way, rDNS broken, or tothers. And we get so many complaining that the namenode is refusing connections, when either the firewall is up, the port settings for the client are wrong, the hostname is wrong or the NN isn't up. Same for BindException. We've gone to the effort of adding wrappers around all socket exceptions to add in hostnames and ports (the things people who understand networking need), and wiki entries to help people fend for themselves and not file Critical issues about problems that they generally have to fix for themselves. Yet even with those exceptions saying "look at the wiki" entry, we still get people not following the link, but going straight to JIRA: HADOOP-12391 . if you look at the history of those wiki entries, you can see that they continually grow as we find new system setup issues which trigger the exception. That's because I do hit problems, I do fix them myself, and whenever I do that, I add another line. If you've found a new way, once fixed, I encourage you add a new entry. And, at the same time, you are free to change that text at the end.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          +1 about this patch. Checking this in.

          Show
          ozawa Tsuyoshi Ozawa added a comment - +1 about this patch. Checking this in.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Committed this to trunk and branch-2. Thanks Brahma Reddy Battula for your contribution.

          Edward Capriolo do you mind opening new jira to address the problem you mentioned?

          Show
          ozawa Tsuyoshi Ozawa added a comment - Committed this to trunk and branch-2. Thanks Brahma Reddy Battula for your contribution. Edward Capriolo do you mind opening new jira to address the problem you mentioned?
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Tsuyoshi Ozawa thanks a lot for committing and reviewing this issue..

          Show
          brahmareddy Brahma Reddy Battula added a comment - Tsuyoshi Ozawa thanks a lot for committing and reviewing this issue..
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #593 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/593/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #593 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/593/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8718 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8718/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8718 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8718/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2483 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2483/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2483 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2483/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2537 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2537/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2537 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2537/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #607 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/607/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #607 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/607/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1330 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1330/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1330 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1330/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #546 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/546/)
          YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #546 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/546/ ) YARN-4251 . (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java

            People

            • Assignee:
              brahmareddy Brahma Reddy Battula
              Reporter:
              brahmareddy Brahma Reddy Battula
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development