Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10341

Yarn Service Container Completed event doesn't get processed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.4.0, 3.3.1
    • 3.4.0, 3.3.1
    • service-scheduler
    • None
    • Reviewed

    Description

      If there 10 workers running and if containers get killed , after a while we see that there are just 9 workers runnning. This is due to CONTAINER COMPLETED Event is not processed on AM side.
      Issue is in below code:

      public void onContainersCompleted(List<ContainerStatus> statuses) {
            for (ContainerStatus status : statuses) {
              ContainerId containerId = status.getContainerId();
              ComponentInstance instance = liveInstances.get(status.getContainerId());
              if (instance == null) {
                LOG.warn(
                    "Container {} Completed. No component instance exists. exitStatus={}. diagnostics={} ",
                    containerId, status.getExitStatus(), status.getDiagnostics());
                return;
              }
              ComponentEvent event =
                  new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
                      .setStatus(status).setInstance(instance)
                      .setContainerId(containerId);
              dispatcher.getEventHandler().handle(event);
            }
      

      If component instance doesnt exist for a container, it doesnt iterate over other containers as its returning from method. This happens when restart_policy is "ON_FAILURE"

      Attachments

        1. YARN-10341.004.patch
          8 kB
          Bilwa S T
        2. YARN-10341.003.patch
          8 kB
          Bilwa S T
        3. YARN-10341.002.patch
          8 kB
          Bilwa S T
        4. YARN-10341.001.patch
          1 kB
          Bilwa S T

        Activity

          People

            BilwaST Bilwa S T
            BilwaST Bilwa S T
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: