Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.1
    • Fix Version/s: 0.17.2
    • Component/s: None
    • Labels:
      None

      Description

      On a cluster with about 1700 nodes, when a job with about 100,000 maps and 10,000 reduces completed, the JobTracker, even with 80 handlers, could not handle the rpc call load during promotion of the job, such that at the end, because of the discarded heartbeats, the JobTracker lost nearly all TaskTrackers (about 10 TaskTrackers left). Promotion took more than 40 minutes.
      They reconnected and everything recovered, but this might have been just luck.
      Shouldn't there be an adaptive throttling of the rate in heartbeats and TaskCompletionEvents?

      Sample messsages:
      2008-07-22 18:21:55,831 WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding oldest call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@115f6b6, false, true, 18137) from xxx
      2008-07-22 18:21:55,834WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding oldest call getTaskCompletionEvents(job_200807190635_0012, 119567, 50) from yyy
      ...
      2008-07-22 19:02:28,821 WARN org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9020, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@19d32fa, false, true, 18199) from zzz: discarded for being too old (40936)
      2008-07-22 19:02:28,821 WARN org.apache.hadoop.ipc.Server: IPC Server handler 34 on 9020, call getTaskCompletionEvents(job_200807190635_0012, 119567, 50) from uuu: discarded for being too old (40978)

      1. patch-3813.txt
        0.6 kB
        Amareshwari Sriramadasu
      2. patch-3813-0.17.txt
        0.6 kB
        Amareshwari Sriramadasu
      3. patch-3813-1.txt
        0.9 kB
        Arun C Murthy

        Activity

        Christian Kunz created issue -
        Devaraj Das made changes -
        Field Original Value New Value
        Assignee Amareshwari Sriramadasu [ amareshwari ]
        Amareshwari Sriramadasu made changes -
        Attachment patch-3813.txt [ 12386711 ]
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Amareshwari Sriramadasu made changes -
        Attachment patch-3813-0.17.txt [ 12386779 ]
        Amareshwari Sriramadasu made changes -
        Fix Version/s 0.17.2 [ 12313296 ]
        Fix Version/s 0.18.0 [ 12312972 ]
        Fix Version/s 0.19.0 [ 12313211 ]
        Arun C Murthy made changes -
        Attachment patch-3813-1.txt [ 12386837 ]
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Owen O'Malley made changes -
        Fix Version/s 0.19.0 [ 12313211 ]
        Fix Version/s 0.18.0 [ 12312972 ]
        Owen O'Malley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Christian Kunz
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development