Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2389

Spurious EOFExceptions reading SpillRecord index files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.22.0
    • None
    • tasktracker
    • None
    • Seen on RHEL 5.5, RHEL 6.0, local dirs on ext3, Java 6u20 and 6u24

    Description

      In large jobs, I see around 1 shuffle fetch out of every million fetches fail with an EOFException reading the SpillRecord index file. After lots of investigation, including systemtap, it looks like the read() syscall is actually returning a premature "0" result for no reason, so this is likely a kernel or filesystem bug which is exacerbated by some workload the TT does.

      Attachments

        1. stap-output.txt
          2 kB
          Todd Lipcon

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tlipcon Todd Lipcon
              Votes:
              4 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated: