Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
HIVE-17574 was about an optimization, where HDFS-based resources optionally were localized directly from the "original" hdfs folder instead of a tez session dir. This reduced the HDFS overhead, by introducing hive.resource.use.hdfs.location, so there are 2 cases:
1. hive.resource.use.hdfs.location=true
a) collect "HDFS temp files" and optimize their access: added files, added jars
b) collect local temp files and use the non-optimized session-based approach: added files, added jars, aux jars, reloadable aux jars
// reference HDFS based resource directly, to use distribute cache efficiently. addHdfsResource(conf, tmpResources, LocalResourceType.FILE, getHdfsTempFilesFromConf(conf)); // local resources are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getLocalTempFilesFromConf(conf), null).values() );
2. hive.resource.use.hdfs.location=false
a) original behavior: collect all jars in hs2's scope (added files, added jars, aux jars, reloadable aux jars) and put it to a session based directory
// all resources including HDFS are session based. tmpResources.addAll( addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE, getTempFilesFromConf(conf), null).values() );
my proposal is related to 1)
let's say user is about to load an aux jar from hdfs and have it set in hive.aux.jars.path:
hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
in this case: we can distinguish between file:// scheme resources and hdfs:// scheme resources:
- file scheme resources should fall into 1b), still be used from session dir
- hdfs scheme resources should fall into 1a), simply used by addHdfsResource
this needs a bit of attention at every usages of aux jars, because aux jars are e.g. supposed to be classloaded to HS2 sessions, so in case of an hdfs resource, it should be taken care of
Attachments
Issue Links
- is related to
-
HIVE-17574 Avoid multiple copies of HDFS-based jars when localizing job-jars
- Resolved