Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
1.2.1
-
None
-
None
-
None
Description
JobTracker is slow when there are huge number of Jobs running and 30
connections were established to info port to view Job status and counters.
hadoop job -list took 4m22.412s
We took Jstack traces and found most of the server threads waiting on JobTracker object and the thread which has the lock on JobTracker waits for ResourceBundle object.
"retireJobs" prio=10 tid=0x00007f2345200800 nid=0x11c1 waiting for
monitor entry [0x00007f22e3499000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
- waiting to lock <0x0000000197cc6218> (a java.lang.Class for
org.apache.hadoop.mapreduce.util.ResourceBundles)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
at
org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534) - locked <0x00000007f8411608> (a org.apache.hadoop.mapred.Counters)
at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
at
org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
at
org.apache.hadoop.mapred.JobTracker$RetireJobs.addToCache(JobTracker.java:657) - locked <0x000000009644ae08> (a
org.apache.hadoop.mapred.JobTracker$RetireJobs)
at
org.apache.hadoop.mapred.JobTracker$RetireJobs.run(JobTracker.java:769) - locked <0x00000000964c5550> (a
org.apache.hadoop.mapred.FairScheduler) - locked <0x000000009644a9d0> (a java.util.Collections$SynchronizedMap)
- locked <0x00000000962ac660> (a org.apache.hadoop.mapred.JobTracker)
at java.lang.Thread.run(Thread.java:745)
The ResourceBundle object is locked most of the time by JT GUI jobtracker_jsp and does getMapCounters().
"926410165@qtp-1732070199-56" daemon prio=10 tid=0x00007f232c4df000 nid=0x27c0
runnable [0x00007f22db7bf000]
java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x000000061a49ede0> (a java.util.MissingResourceException)
at java.lang.Throwable.<init>(Throwable.java:287)
at java.lang.Exception.<init>(Exception.java:84)
at java.lang.RuntimeException.<init>(RuntimeException.java:80)
at
java.util.MissingResourceException.<init>(MissingResourceException.java:85)
at
java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1499)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1322)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:1028)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56) - locked <0x0000000197cc6218> (a java.lang.Class for
org.apache.hadoop.mapreduce.util.ResourceBundles)
at
org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
at
org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534) - locked <0x00000007ed1024b8> (a org.apache.hadoop.mapred.Counters)
at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
at
org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
at org.apache.hadoop.mapred.JSPUtil.generateJobTable(JSPUtil.java:436)
at
org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:202)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
Every job updates their counters and all 30 UI clients reading the frequently updated counters leading to JT slowness.
With no JT UI requests, hadoop job -list completes in seconds.
How to fix JT slowness when there are 30 sessions wants to know the Job status and counters of huge number of Jobs running at a time.
Is there any workaround like JT UI caching or offloading some part in JT UI frontpage when load is heavy.