Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.22.0
-
None
-
None
-
Seen on RHEL 5.5, RHEL 6.0, local dirs on ext3, Java 6u20 and 6u24
Description
In large jobs, I see around 1 shuffle fetch out of every million fetches fail with an EOFException reading the SpillRecord index file. After lots of investigation, including systemtap, it looks like the read() syscall is actually returning a premature "0" result for no reason, so this is likely a kernel or filesystem bug which is exacerbated by some workload the TT does.
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-2386 TT jetty server stuck in tight loop around epoll_wait
- Open
-
MAPREDUCE-2980 Fetch failures and other related issues in Jetty 6.1.26
- Open