It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the running containers are killed and second attempt is launched for that application. The diagnostics message is "Opportunistic container queue is full" which is the reason for container killed.
In NM log, I see below logs for after container is recovered.
Following steps are executed for rolling upgrade
- Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
- Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
- Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch.