Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Job initialization process was changed to not change (run) states during initialization. The reason is two fold
      - this can lead to deadlock as state changes require circular locking (i.e JobInProgress requires JobTracker lock)
      - events were not raised as these state changes were not informed/propogated back to the JobTracker

      Now the JobTracker takes care of initializing/failing/killing the job and raising appropriate events. The simple rule that was enforced was that "The JobTracker lock is *must* before changing the run-state of a job".
      Show
      Job initialization process was changed to not change (run) states during initialization. The reason is two fold - this can lead to deadlock as state changes require circular locking (i.e JobInProgress requires JobTracker lock) - events were not raised as these state changes were not informed/propogated back to the JobTracker Now the JobTracker takes care of initializing/failing/killing the job and raising appropriate events. The simple rule that was enforced was that "The JobTracker lock is *must* before changing the run-state of a job".

      Description

      We are running a hadoop cluster (version 0.20.0) and have detected the following deadlock on our jobtracker:

      "IPC Server handler 51 on 9001":
      	at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
      	- waiting to lock <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
      	- locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
      	at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
       "pool-1-thread-2":
      	at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
      	- waiting to lock <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
      	at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:619)
      
      1. MAPREDUCE-805-v1.1.patch
        8 kB
        Amar Kamat
      2. MAPREDUCE-805-v1.2.patch
        10 kB
        Amar Kamat
      3. MAPREDUCE-805-v1.3.patch
        10 kB
        Amar Kamat
      4. MAPREDUCE-805-v1.6.patch
        23 kB
        Amar Kamat
      5. MAPREDUCE-805-v1.7.patch
        23 kB
        Amar Kamat
      6. MAPREDUCE-805-v1.11.patch
        19 kB
        Amar Kamat
      7. MAPREDUCE-805-v1.11-branch-0.20.patch
        22 kB
        Amar Kamat
      8. MAPREDUCE-805-v1.12-branch-0.20.patch
        22 kB
        Amar Kamat
      9. MAPREDUCE-805-v1.12.patch
        19 kB
        Amar Kamat

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Michael Tamm
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development