Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2347

Remove unnecessary new Configuration()/new jobConf() calls from oozie

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.3.0
    • None
    • None

    Description

      We noticed that setting of job sharelib was slow and one prime reason was lot of thread was blocked on "java.util.zip.ZipFile.getEntry"
      <0x00000005c0afda68> (a java.util.jar.JarFile): 0 Thread(s) sleeping, 178 Thread(s) waiting, 1 Thread(s) locking

      There are lot of places we do new Configuration()/new jobConf() unnecessarily. This can be easily removed to enhance performance.

      1.
      Configuration defaultConf = new Configuration(); is called for every file we add to classpath.

      public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException {
            Configuration defaultConf = new Configuration();
            XConfiguration.copy(conf, defaultConf);
            if (fs == null) {
              // it fails with conf, therefore we pass defaultConf instead
              fs = file.getFileSystem(defaultConf);
            }
            // Hadoop 0.20/1.x.
            if (defaultConf.get("yarn.resourcemanager.webapp.address") == null) {
                // Duplicate hadoop 1.x code to workaround MAPREDUCE-2361 in Hadoop 0.20
                // Refer OOZIE-1806.
                String filepath = file.toUri().getPath();
                String classpath = conf.get("mapred.job.classpath.files");
                conf.set("mapred.job.classpath.files", classpath == null
                    ? filepath
                    : classpath + System.getProperty("path.separator") + filepath);
                URI uri = fs.makeQualified(file).toUri();
                DistributedCache.addCacheFile(uri, conf);
            }
            else { // Hadoop 0.23/2.x
                DistributedCache.addFileToClassPath(file, conf, fs);
            }
          }
      

      2.
      sharelib setup also calls new Configuration(), which is not needed.

      public Configuration getShareLibConf(String inputKey, Path path) {
              Configuration conf = new Configuration();
              if (shareLibConfigMap.containsKey(inputKey)) {
                  conf = shareLibConfigMap.get(inputKey).get(path);
              }
      
              return conf;
          }
      

      3.CoordActionInputCheckXCommand.checkPath also creates jobConf every time.

      Attachments

        1. OOZIE-2347-V1.patch
          4 kB
          Purshotam Shah
        2. OOZIE-2347-V2.patch
          7 kB
          Purshotam Shah
        3. amend-OOZIE-2347-V1.patch
          3 kB
          Purshotam Shah
        4. amend-OOZIE-2347-V2.patch
          3 kB
          Purshotam Shah

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            puru Purshotam Shah
            puru Purshotam Shah
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment