Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-610

Task Tracker offerService does not adequately protect from exceptions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      The TaskTracker's offerService loop doesn't handle exceptions, such as time outs well and will reset the task tracker. I believe this is the cause of most of the lost task trackers. The scenario looks like:

      1. an rpc timeout in offerService
      2. the task tracker cleans up (which takes 30 minutes with the task tracker locked up)
      3. the task tracker is declared lost for not providing its heartbeat

        Attachments

        1. lost-tt.patch
          40 kB
          Owen O'Malley

          Activity

            People

            • Assignee:
              omalley Owen O'Malley
              Reporter:
              omalley Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: