Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2347

Remove unnecessary new Configuration()/new jobConf() calls from oozie

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.0
    • Component/s: None
    • Labels:
      None

      Description

      We noticed that setting of job sharelib was slow and one prime reason was lot of thread was blocked on "java.util.zip.ZipFile.getEntry"
      <0x00000005c0afda68> (a java.util.jar.JarFile): 0 Thread(s) sleeping, 178 Thread(s) waiting, 1 Thread(s) locking

      There are lot of places we do new Configuration()/new jobConf() unnecessarily. This can be easily removed to enhance performance.

      1.
      Configuration defaultConf = new Configuration(); is called for every file we add to classpath.

      public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException {
            Configuration defaultConf = new Configuration();
            XConfiguration.copy(conf, defaultConf);
            if (fs == null) {
              // it fails with conf, therefore we pass defaultConf instead
              fs = file.getFileSystem(defaultConf);
            }
            // Hadoop 0.20/1.x.
            if (defaultConf.get("yarn.resourcemanager.webapp.address") == null) {
                // Duplicate hadoop 1.x code to workaround MAPREDUCE-2361 in Hadoop 0.20
                // Refer OOZIE-1806.
                String filepath = file.toUri().getPath();
                String classpath = conf.get("mapred.job.classpath.files");
                conf.set("mapred.job.classpath.files", classpath == null
                    ? filepath
                    : classpath + System.getProperty("path.separator") + filepath);
                URI uri = fs.makeQualified(file).toUri();
                DistributedCache.addCacheFile(uri, conf);
            }
            else { // Hadoop 0.23/2.x
                DistributedCache.addFileToClassPath(file, conf, fs);
            }
          }
      

      2.
      sharelib setup also calls new Configuration(), which is not needed.

      public Configuration getShareLibConf(String inputKey, Path path) {
              Configuration conf = new Configuration();
              if (shareLibConfigMap.containsKey(inputKey)) {
                  conf = shareLibConfigMap.get(inputKey).get(path);
              }
      
              return conf;
          }
      

      3.CoordActionInputCheckXCommand.checkPath also creates jobConf every time.

        Attachments

        1. OOZIE-2347-V2.patch
          7 kB
          Purshotam Shah
        2. OOZIE-2347-V1.patch
          4 kB
          Purshotam Shah
        3. amend-OOZIE-2347-V2.patch
          3 kB
          Purshotam Shah
        4. amend-OOZIE-2347-V1.patch
          3 kB
          Purshotam Shah

          Activity

            People

            • Assignee:
              puru Purshotam Shah
              Reporter:
              puru Purshotam Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: