Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-21530

Service Checks During Upgrades Should Use Desired Stack

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.5.2
    • 2.5.2
    • ambari-server
    • None

    Description

      During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. This is because the service checks were having their hooks/service folders overwritten by some of the scheduler framework. At the time of orchestration, the cluster desired ID was still BI but the effective ID used for the upgrade was HDP (which was being clobbered)

      Exception on running YARN service check:

      Traceback (most recent call last):
        File "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py", line 91, in <module>
          ServiceCheck().execute()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
          method(env)
        File "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py", line 54, in service_check
          user=params.smokeuser,
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
          result = function(command, **kwargs)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
          tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
          result = _call(command, **kwargs_copy)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
          raise ExecutionFailed(err_msg, code, out, err)
      resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned 1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client
      17/07/19 19:34:40 INFO distributedshell.Client: Running Client
      17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050
      17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200
      17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=1
      17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM
      17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, nodeRackName/default-rack, nodeNumContainers0
      17/07/19 19:34:40 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
      17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS
      17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE
      17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
      17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
      17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of resources in this cluster 10240
      17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty of resources in this cluster 3
      17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment
      17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client
      java.io.FileNotFoundException: File /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not exist
      	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
      	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
      	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
      	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
      	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
      	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2012)
      	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1980)
      	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1945)
      	at org.apache.hadoop.yarn.applications.distributedshell.Client.addToLocalResources(Client.java:820)
      	at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:532)
      	at org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:215)
      

      Attachments

        1. AMBARI-21530.patch
          14 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: