Uploaded image for project: 'Ambari (Retired)'
  1. Ambari (Retired)
  2. AMBARI-25604

During blueprint deploy tasks sometimes fail due to KeyError on large clusters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.7.6
    • None
    • None

    Description

      During blueprint deploy we don't rely on topology cache since AMBARI-23660
      So correct topology is send with
      the command, however the topology from the topology event can be wrong as per AMBARI-23660.

      The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. Currently it just fails the whole command.

      ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: <type 'exceptions.KeyError'>: 10; 10
      Traceback (most recent call last):
        File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand
          command = self.generate_command(command_header)
        File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command
          command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp)
        File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration
          'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
        File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction
          return f(*args, **kw)
        File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info
          hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds]
      KeyError: 10

      Attachments

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              aonishuk Andrew Onischuk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m