Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3642

Hadoop2 yarn.resourcemanager.scheduler.address not loaded by RMProxy.java

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.7.0
    • None
    • resourcemanager
    • None

    Description

      There is an issue with Hadoop 2.7.0 when in distributed operation the datanode is unable to reach the yarn scheduler. In our yarn-site.xml, we have defined this path to be:

         <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>qadoop-nn001.apsalar.com:8030</value>
         </property>
      

      But when running an oozie job, the problem manifests when looking at the job logs for the yarn container.
      We see logs similar to the following showing the connection problem:

      Showing 4096 bytes. Click here for full log
      [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 64065
      2015-05-13 17:49:33,930 INFO [main] org.mortbay.log: jetty-6.1.26
      2015-05-13 17:49:33,971 INFO [main] org.mortbay.log: Extract jar:file:/opt/local/hadoop/hadoop-2.7.0/share/hadoop/yarn/hadoop-yarn-common-2.7.0.jar!/webapps/mapreduce to /var/tmp/Jetty_0_0_0_0_64065_mapreduce____.1ayyhk/webapp
      2015-05-13 17:49:34,234 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:64065
      2015-05-13 17:49:34,234 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce started at 64065
      2015-05-13 17:49:34,645 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
      2015-05-13 17:49:34,651 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
      2015-05-13 17:49:34,652 INFO Socket Reader #1 for port 38927 org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 38927
      2015-05-13 17:49:34,660 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
      2015-05-13 17:49:34,660 INFO [IPC Server listener on 38927] org.apache.hadoop.ipc.Server: IPC Server listener on 38927: starting
      2015-05-13 17:49:34,700 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
      2015-05-13 17:49:34,700 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
      2015-05-13 17:49:34,700 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
      2015-05-13 17:49:34,775 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
      2015-05-13 17:49:35,820 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:36,821 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:37,823 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:38,824 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:39,825 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:40,826 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:41,827 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:42,828 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:43,829 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-13 17:49:44,830 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

      To prove the problem, we have patched the file:

      hadoop-2.7.0/src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
      

      so that we now "inject" the yarn.resourcemanager.scheduler.address directly into the configuration.

      The modified code looks like this:

        @Private
        protected static <T> T createRMProxy(final Configuration configuration,
            final Class<T> protocol, RMProxy instance) throws IOException {
          YarnConfiguration conf = (configuration instanceof YarnConfiguration)
              ? (YarnConfiguration) configuration
              : new YarnConfiguration(configuration);
          LOG.info("LEE: changing the conf to include yarn.resourcemanager.scheduler.address at 10.1.26.1");
          conf.set("yarn.resourcemanager.scheduler.address", "10.1.26.1");
          RetryPolicy retryPolicy = createRetryPolicy(conf);
          if (HAUtil.isHAEnabled(conf)) {
            RMFailoverProxyProvider<T> provider =
                instance.createRMFailoverProxyProvider(conf, protocol);
            return (T) RetryProxy.create(protocol, provider, retryPolicy);
          } else {
            InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
            LOG.info("LEE: Connecting to ResourceManager at " + rmAddress);
            T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
            return (T) RetryProxy.create(protocol, proxy, retryPolicy);
          }
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Lee Hounshell Lee Hounshell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: