Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-22628

YARN Shuffle Service Can't Be Found On Client-Only Nodes After New Cluster Install

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.6.1
    • 2.6.1
    • None
    • None

    Description

      Installing a new cluster can create values in yarn-site.xml which have None specified in the classpath for Spark

      <property>
            <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
            <value>/usr/hdp/None/spark2/aux/*</value>
          </property>
      
       <property>
            <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
            <value>/usr/hdp/None/spark/aux/*</value>
          </property>
      
      <property>
            <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
            <value>/usr/hdp/None/spark/hdpLib/*</value>
          </property>
      

      The cause for this is that YARN Clients on hosts without daemons never get a restart command after the initial yarn-site.xml, and can never fill in the correct values. This causes problems when jobs are run on these nodes:

      2017-12-04 10:16:41,789 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED; cause: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
      java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
      

      Attachments

        1. AMBARI-22628.patch
          14 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              kramakrishnan Kishor Ramakrishnan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: