Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7550

Allow YARN HA to be fault tolerant on missing DNS entries

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.6.5
    • Fix Version/s: None
    • Component/s: client
    • Labels:
      None

      Description

      Should for some reason from the DNS registry one of the ResourceManager host's would be missing, the HA configuration of the ClientProxy is not fault tolerant enough to survive this.

      To ensure that even in the face of DNS resolution issues, when at least one of the RMs can be resolved, then allow the tokenService call to succeed. This can be seen at:
      https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L153

      We can safely assume if one of the RMs is missing from DNS, they can't be the active one anyways, so clients jobs can still be submitted while people fix the DNS issues.

      A sample exception when one of the entries are missing:

      17/11/02 18:20:35 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; cause: java.lang.IllegalArgumentException: java.net.UnknownHostException: some.dns.entry
      at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) 
      at org.apache.hadoop.yarn.client.ClientRMProxy.getTokenService(ClientRMProxy.java:153) 
      at org.apache.hadoop.yarn.client.ClientRMProxy.getAMRMTokenService(ClientRMProxy.java:138) 
      at org.apache.hadoop.yarn.client.ClientRMProxy.setAMRMTokenService(ClientRMProxy.java:80) 
      at org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:99) 
      at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal(ConfiguredRMFailoverProxyProvider.java:76) 
      at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxy(ConfiguredRMFailoverProxyProvider.java:90) 
      at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:75) 
      at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:66) 
      at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) 
      at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:95) 
      at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) 
      at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186) 
      at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
      at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:65) 
      at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:359) 
      at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:435) 
      at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256) 
      at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:774) 
      at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) 
      at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) 
      at java.security.AccessController.doPrivileged(Native Method) 
      at javax.security.auth.Subject.doAs(Subject.java:422) 
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) 
      at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) 
      at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:772) 
      at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:795) 
      at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 
      Caused by: java.net.UnknownHostException: some.dns.entry 
      ... 28 more
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Bearricade Istvan Vajnorak
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: