Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27737

Consider extending HIVE-17574 to aux jars

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      HIVE-17574 was about an optimization, where HDFS-based resources optionally were localized directly from the "original" hdfs folder instead of a tez session dir. This reduced the HDFS overhead, by introducing hive.resource.use.hdfs.location, so there are 2 cases:

      1. hive.resource.use.hdfs.location=true
      a) collect "HDFS temp files" and optimize their access: added files, added jars
      b) collect local temp files and use the non-optimized session-based approach: added files, added jars, aux jars, reloadable aux jars

            // reference HDFS based resource directly, to use distribute cache efficiently.
            addHdfsResource(conf, tmpResources, LocalResourceType.FILE, getHdfsTempFilesFromConf(conf));
            // local resources are session based.
            tmpResources.addAll(
                addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
                    getLocalTempFilesFromConf(conf), null).values()
            );
      

      2. hive.resource.use.hdfs.location=false
      a) original behavior: collect all jars in hs2's scope (added files, added jars, aux jars, reloadable aux jars) and put it to a session based directory

            // all resources including HDFS are session based.
            tmpResources.addAll(
                addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
                    getTempFilesFromConf(conf), null).values()
            );
      

      my proposal is related to 1)
      let's say user is about to load an aux jar from hdfs and have it set in hive.aux.jars.path:

      hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
      

      in this case: we can distinguish between file:// scheme resources and hdfs:// scheme resources:

      • file scheme resources should fall into 1b), still be used from session dir
      • hdfs scheme resources should fall into 1a), simply used by addHdfsResource

      this needs a bit of attention at every usages of aux jars, because aux jars are e.g. supposed to be classloaded to HS2 sessions, so in case of an hdfs resource, it should be taken care of

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              abstractdog László Bodor
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: