Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-734

java.util.ConcurrentModificationException observed in unreserving slots for HiRam Jobs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: capacity-sched
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Ran jobs out which 3 were HiRAM, the job were not removed from scheduler queue even after they successfully completed
      hadoop queue -info queue -showJobs displays somwthing like -:
      job_200907080724_0031 2 1247059146868 username NORMAL 0 running map tasks using 0 map slots. 0 additional slots reserved. 0 running reduce tasks using 0 reduce slots. 60 additional slots reserved.
      job_200907080724_0030 2 1247059146972 username NORMAL 0 running map tasks using 0 map slots. 0 additional slots reserved. 0 running reduce tasks using 0 reduce slots. 60 additional slots reserved.

      But it does not block anything, but seems like zombie process of system
      Jobtracker log show java.util.ConcurrentModificationException

      1. MAPREDUCE-734_0_20090708.patch
        1 kB
        Arun C Murthy
      2. MAPREDUCE-734_0_20090708_yhadoop20.patch
        1 kB
        Arun C Murthy
      3. MAPREDUCE-734-1.patch
        8 kB
        Sreekanth Ramakrishnan
      4. MAPREDUCE-734-2.patch
        9 kB
        Sreekanth Ramakrishnan
      5. MAPREDUCE-734-ydist.patch
        2 kB
        Sreekanth Ramakrishnan
      6. MAPREDUCE-734-20.patch
        2 kB
        Hemanth Yamijala

        Activity

        Hide
        Karam Singh added a comment -

        Stack trace -:

        2009-07-08 13:34:18,260 INFO org.apache.hadoop.ipc.Server: IPC Server handler 49 on 50300, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@61415858, false, false, true, 7367) from <host:port>: error: java.io.IOException: java.util.ConcurrentModificationException
        java.io.IOException: java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
        at java.util.HashMap$KeyIterator.next(HashMap.java:828)
        at org.apache.hadoop.mapred.JobInProgress.cancelReservedSlots(JobInProgress.java:2361)
        at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2267)
        at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2213)
        at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:952)
        at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3899)
        at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3090)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2828)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

        Show
        Karam Singh added a comment - Stack trace -: 2009-07-08 13:34:18,260 INFO org.apache.hadoop.ipc.Server: IPC Server handler 49 on 50300, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@61415858, false, false, true, 7367) from <host:port>: error: java.io.IOException: java.util.ConcurrentModificationException java.io.IOException: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.hadoop.mapred.JobInProgress.cancelReservedSlots(JobInProgress.java:2361) at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2267) at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2213) at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:952) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3899) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3090) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2828) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        Hide
        Arun C Murthy added a comment -

        Karam, do you know if you submitted any jobs (in particular high-ram jobs) with zero maps and zero reduces before you saw this?

        Show
        Arun C Murthy added a comment - Karam, do you know if you submitted any jobs (in particular high-ram jobs) with zero maps and zero reduces before you saw this?
        Hide
        Arun C Murthy added a comment -

        I had to fix this embarrassing bug my making a copy of the set of TaskTracker objects in JobInProgress.cancelReserveSlots before calling TaskTracker.unreserveSlots which calls JobInProgress.unreserveTaskTracker to remove the TaskTracker object from the map. Oops! hides

        Show
        Arun C Murthy added a comment - I had to fix this embarrassing bug my making a copy of the set of TaskTracker objects in JobInProgress.cancelReserveSlots before calling TaskTracker.unreserveSlots which calls JobInProgress.unreserveTaskTracker to remove the TaskTracker object from the map. Oops! hides
        Hide
        Arun C Murthy added a comment -

        Patch for yahoo hadoop-20 branch.

        Show
        Arun C Murthy added a comment - Patch for yahoo hadoop-20 branch.
        Hide
        Hemanth Yamijala added a comment -

        This is looking fine.

        • I would recommend we move the call to cancelReservedSlots into garbageCollect rather than in jobComplete and terminate where they are currently defined. The reason being there is another API terminateJob() which can be called to end a job as well. In that case too, we'll need to cancel the reservations. Rather than adding at a new place, I think we can instead move all the calls to garbageCollect which is guaranteed to be called in all cases. (I confirmed this by checking with the M/R team).
        • It would be good to add a test case to it. The simplest way is to use the mock object facilities being added now. For instance, I think we can use FakeObjectUtilities.FakeJobInProgress, create a bunch of TaskTracker objects and reserve slots in them for the FakeJobInProgress we create. Then we can finish the job which should trigger calls to unreserve the trackers.

        Arun, in order to save on time (since we need to run all the tests etc) I've requested Sreekanth to look at the test case.

        Show
        Hemanth Yamijala added a comment - This is looking fine. I would recommend we move the call to cancelReservedSlots into garbageCollect rather than in jobComplete and terminate where they are currently defined. The reason being there is another API terminateJob() which can be called to end a job as well. In that case too, we'll need to cancel the reservations. Rather than adding at a new place, I think we can instead move all the calls to garbageCollect which is guaranteed to be called in all cases. (I confirmed this by checking with the M/R team). It would be good to add a test case to it. The simplest way is to use the mock object facilities being added now. For instance, I think we can use FakeObjectUtilities.FakeJobInProgress, create a bunch of TaskTracker objects and reserve slots in them for the FakeJobInProgress we create. Then we can finish the job which should trigger calls to unreserve the trackers. Arun, in order to save on time (since we need to run all the tests etc) I've requested Sreekanth to look at the test case.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attachin file incorporating Hemanths comments:

        1. Moved out cancelReservations() to garbageCollect()
        2. Added new test case.

        Show
        Sreekanth Ramakrishnan added a comment - Attachin file incorporating Hemanths comments: 1. Moved out cancelReservations() to garbageCollect() 2. Added new test case.
        Hide
        Hemanth Yamijala added a comment -

        +1. Can you please start tests and test-patch ?

        Show
        Hemanth Yamijala added a comment - +1. Can you please start tests and test-patch ?
        Hide
        Sreekanth Ramakrishnan added a comment -

        My bad, didnt include apache header in the generated patch.

        output from ant test-patch with latest patch

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch generated 316 release audit warnings (more than the trunk's current 315 warnings).
             [exec]
        
        Show
        Sreekanth Ramakrishnan added a comment - My bad, didnt include apache header in the generated patch. output from ant test-patch with latest patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch generated 316 release audit warnings (more than the trunk's current 315 warnings). [exec]
        Hide
        Sreekanth Ramakrishnan added a comment -

        Y! distribution patch

        Show
        Sreekanth Ramakrishnan added a comment - Y! distribution patch
        Hide
        Sreekanth Ramakrishnan added a comment -

        All tests passed locally. Output from ant test-patch

             [exec]
             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
             [exec]
             [exec]
        
        Show
        Sreekanth Ramakrishnan added a comment - All tests passed locally. Output from ant test-patch [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] [exec]
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Arun and Sreekanth !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Arun and Sreekanth !
        Hide
        Hemanth Yamijala added a comment -

        Patch (MAPREDUCE-734-20.patch) for Yahoo! Hadoop distribution. The previous one had a minor error preventing compilation. Verified that this compiles fine. Also ran capacity scheduler tests for sanity.

        Show
        Hemanth Yamijala added a comment - Patch ( MAPREDUCE-734 -20.patch) for Yahoo! Hadoop distribution. The previous one had a minor error preventing compilation. Verified that this compiles fine. Also ran capacity scheduler tests for sanity.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Karam Singh
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development