Details
Description
The design of yarn shared cache manager is only to allow application master should upload the jar/files/resource. However, there was a bug in the code since 2.9.0. Every node manager that take the job task will try to upload the jar/resources. Let's say one job have 5000 tasks. Then there will be up to 5000 NMs try to upload the jar. This is like DDOS and create a snowball effect. It will end up with inavailability of yarn shared cache manager. It wil cause time out in localization and lead to job failure.
Attachments
Issue Links
- relates to
-
MAPREDUCE-6824 TaskAttemptImpl#createCommonContainerLaunchContext is longer than 150 lines
- Resolved
- links to