Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Hundreds of GB scale MapReduce application running with DataSkewPolicy suffers occasional map stage hang - one map stage TaskGroup doesn't gets finished forever.
It seems that there might be a race condition in the process of handling map stage TaskGroup state from ON_HOLD to COMPLETE, but we need to find the root cause of it.