Details
Description
Tez jobs can hang in shuffle waiting for a memory merge that never starts. When a MapOutput is reserved it increments usedMemory but when it is unreserved it decrements usedMemory and commitMemory. If enough shuffle failures occur of sufficient size then commitMemory may never reach the merge threshold even after all outstanding transfers have committed and thus hang the shuffle.
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-4842 Shuffle race can hang reducer
- Closed