Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We noticed that setting of job sharelib was slow and one prime reason was lot of thread was blocked on "java.util.zip.ZipFile.getEntry"
<0x00000005c0afda68> (a java.util.jar.JarFile): 0 Thread(s) sleeping, 178 Thread(s) waiting, 1 Thread(s) locking
There are lot of places we do new Configuration()/new jobConf() unnecessarily. This can be easily removed to enhance performance.
1.
Configuration defaultConf = new Configuration(); is called for every file we add to classpath.
public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException { Configuration defaultConf = new Configuration(); XConfiguration.copy(conf, defaultConf); if (fs == null) { // it fails with conf, therefore we pass defaultConf instead fs = file.getFileSystem(defaultConf); } // Hadoop 0.20/1.x. if (defaultConf.get("yarn.resourcemanager.webapp.address") == null) { // Duplicate hadoop 1.x code to workaround MAPREDUCE-2361 in Hadoop 0.20 // Refer OOZIE-1806. String filepath = file.toUri().getPath(); String classpath = conf.get("mapred.job.classpath.files"); conf.set("mapred.job.classpath.files", classpath == null ? filepath : classpath + System.getProperty("path.separator") + filepath); URI uri = fs.makeQualified(file).toUri(); DistributedCache.addCacheFile(uri, conf); } else { // Hadoop 0.23/2.x DistributedCache.addFileToClassPath(file, conf, fs); } }
2.
sharelib setup also calls new Configuration(), which is not needed.
public Configuration getShareLibConf(String inputKey, Path path) { Configuration conf = new Configuration(); if (shareLibConfigMap.containsKey(inputKey)) { conf = shareLibConfigMap.get(inputKey).get(path); } return conf; }
3.CoordActionInputCheckXCommand.checkPath also creates jobConf every time.