[YARN-8346] Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full" - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.1.0, 3.0.2
Fix Version/s: 3.1.0, 2.10.0, 3.2.0, 2.9.2, 3.0.3
Component/s: None
Labels:
None

Target Version/s:

3.1.1, 3.0.3
Hadoop Flags:

Reviewed

Description

It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the running containers are killed and second attempt is launched for that application. The diagnostics message is "Opportunistic container queue is full" which is the reason for container killed.

In NM log, I see below logs for after container is recovered.

2018-05-23 17:18:50,655 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Opportunistic container [container_e06_1527075664705_0001_01_000001] will not be queued at the NMsince max queue length [0] has been reached

Following steps are executed for rolling upgrade

Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-8346.001.patch
23/May/18 18:12
3 kB
Jason Darrell Lowe

Issue Links

blocks

HADOOP-15501 [Umbrella] Upgrade efforts to Hadoop 3.x

Open

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Rohith Sharma K S

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 23/May/18 12:30

Updated:: 07/Nov/18 01:34

Resolved:: 24/May/18 07:08