Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2492

ConcurrentModificationException in org.apache.hadoop.ipc.Server.Responder

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.16.0
    • 0.16.0
    • ipc
    • None

    Description

      I was running hadoop on 800 machines and after running a couple of jobs, and running 100% of the maps of the current job, the JobTracker stopped responding - all tasktrackers were lost ... When I looked at the JT logs, these seemed alarming:
      2007-12-26 19:18:30,185 WARN org.apache.hadoop.ipc.Server: Exception in Responder java.util.ConcurrentModificationException
      Following the above exception, I saw a whole lot of exceptions like:
      2007-12-26 19:23:10,926 WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding oldest call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@5a05f9, false, true, 1758) from 1.2.3.4:1234

      From the number of exceptions to do with call queue overflow, it seemed like the jobtracker was not processing RPCs after it got the ConcurrentModificationException, and around that time the tasktrackers started getting timeouts on RPCs...

      There were two occurrences of the ConcurrentModificationException but the first instance seemed to not have any effect on the call queue...

      Attachments

        1. rpcexception.patch
          1 kB
          Dhruba Borthakur

        Activity

          People

            dhruba Dhruba Borthakur
            ddas Devaraj Das
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: