Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-25604

During blueprint deploy tasks sometimes fail due to KeyError on large clusters

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.6
    • Component/s: None
    • Labels:
      None

      Description

      During blueprint deploy we don't rely on topology cache since AMBARI-23660
      So correct topology is send with
      the command, however the topology from the topology event can be wrong as per AMBARI-23660.

      The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. Currently it just fails the whole command.

      ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: <type 'exceptions.KeyError'>: 10; 10
      Traceback (most recent call last):
        File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand
          command = self.generate_command(command_header)
        File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command
          command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp)
        File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration
          'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
        File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction
          return f(*args, **kw)
        File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info
          hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds]
      KeyError: 10

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                aonishuk Andrew Onischuk
                Reporter:
                aonishuk Andrew Onischuk
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m