Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.15.0
-
None
-
None
Description
This happens on a 1400 node cluster using a recent nightly build patched with HADOOP-1763 (that fixes a previous 'lost task tracker' issue) running a c++-pipes job with 4200 maps and 2800 reduces. The task trackers start to get lost in high numbers at the end of job completion.
Similar non-pipes job do not show the same problem, but is unclear whether it is related to c++-pipes. It could also be dfs overload when reduce tasks close and validate all newly created dfs files. I see dfs client rpc timeout exception. But this alone does not explain the escalation in losing task trackers.
I also noticed that the job tracker becomes rather unresponsive with rpc timeout and call queue overflow exceptions. Job Tracker is running with 60 handlers.
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-1942 Increase the concurrency of transaction logging to edits log
- Closed