Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1140

Deadlock bug involving the o.a.h.metrics package

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.12.1
    • 0.12.2
    • metrics
    • None

    Description

      Hi David,

      Our nightly benchmarks are occasionally failing (2 to 4 of them per night) due to this deadlock in the JT that looks to be caused by Simon. Do you have time to fix this in the morning?

      Thanks,
      Nige

      Found one Java-level deadlock:
      =============================
      "expireLaunchingTasks":
      waiting to lock monitor 0x08141b44 (object 0x57eafdd0, a org.apache.hadoop.mapred.JobTracker),
      which is held by "IPC Server handler 8 on 50020"
      "IPC Server handler 8 on 50020":
      waiting to lock monitor 0x08141630 (object 0x57de46b8, a com.yahoo.simon.hadoop.metrics.SimonContext),
      which is held by "Timer-0"
      "Timer-0":
      waiting to lock monitor 0x08141b44 (object 0x57eafdd0, a org.apache.hadoop.mapred.JobTracker),
      which is held by "IPC Server handler 8 on 50020"

      Java stack information for the threads listed above:
      ===================================================
      "expireLaunchingTasks":
      at org.apache.hadoop.mapred.JobTracker$ExpireLaunchingTasks.run(JobTracker.java:152)

      • waiting to lock <0x57eafdd0> (a org.apache.hadoop.mapred.JobTracker)
        at java.lang.Thread.run(Thread.java:619)
        "IPC Server handler 8 on 50020":
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext.createRecord(AbstractMetricsContext.java:192)
      • waiting to lock <0x57de46b8> (a com.yahoo.simon.hadoop.metrics.SimonContext)
        at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:130)
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1383)
      • locked <0x57eafdd0> (a org.apache.hadoop.mapred.JobTracker)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
        "Timer-0":
        at org.apache.hadoop.mapred.JobTracker.getRunningJobs(JobTracker.java:943)
      • waiting to lock <0x57eafdd0> (a org.apache.hadoop.mapred.JobTracker)
        at org.apache.hadoop.mapred.JobTracker$JobTrackerMetrics.doUpdates(JobTracker.java:429)
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext.timerEvent(AbstractMetricsContext.java:275)
      • locked <0x57de46b8> (a com.yahoo.simon.hadoop.metrics.SimonContext)
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext.access$000(AbstractMetricsContext.java:48)
        at org.apache.hadoop.metrics.spi.AbstractMetricsContext$1.run(AbstractMetricsContext.java:242)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)

      Found 1 deadlock.

      Attachments

        1. 1140.patch
          9 kB
          David Bowen

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dbowen David Bowen
            dbowen David Bowen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment