Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.4.0, 3.3.1
-
None
Description
If there 10 workers running and if containers get killed , after a while we see that there are just 9 workers runnning. This is due to CONTAINER COMPLETED Event is not processed on AM side.
Issue is in below code:
public void onContainersCompleted(List<ContainerStatus> statuses) { for (ContainerStatus status : statuses) { ContainerId containerId = status.getContainerId(); ComponentInstance instance = liveInstances.get(status.getContainerId()); if (instance == null) { LOG.warn( "Container {} Completed. No component instance exists. exitStatus={}. diagnostics={} ", containerId, status.getExitStatus(), status.getDiagnostics()); return; } ComponentEvent event = new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED) .setStatus(status).setInstance(instance) .setContainerId(containerId); dispatcher.getEventHandler().handle(event); }
If component instance doesnt exist for a container, it doesnt iterate over other containers as its returning from method. This happens when restart_policy is "ON_FAILURE"