-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 3.1.2
-
Fix Version/s: None
-
Component/s: mrv2
-
Labels:None
Attempt(map or reduce) remains NEW(state). And job is stuck in certain conditions.
The following are the situations:
- total task(map/reduce) count is same as the running limit of task(mapreduce.job.running.map.limit/mapreduce.job.running.reduce.limit).
- And start job. -> And total tasks(map/reduce) are running. -> And failed attempt for some reasons.
- Request allocation of new containers because the attempt failed.
- Quickly receive allocation of new containers.
- However, new container is released because failed attempts have not been cleared up.(allocated == total == running limit)
- Subsequently, the failed attempts is terminated, but it is waiting forever.
- Job is stuck.
We switched MR frameworks(2.7.1) and checked that it worked well.
Perhaps it is related to MAPREDUCE-6697
Can you help me?