Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0, 2.4.0, 3.0.0
-
None
-
None
-
None
Description
Raising this on behalf of Selina Zhang. (For my own reference: YHIVE-1035.)
This has to do with the classpaths of Hive actions run from Oozie, and affects scripts that adds jars/resources from HDFS locations.
As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be stored in HDFS paths, as are any custom user-libraries used in workflows. An ADD JAR|FILE|ARCHIVE statement in a Hive script causes the following steps to occur:
- Files are downloaded from HDFS to local temp dir.
- UDFs are resolved/validated.
- All jars/files, including those just downloaded from HDFS, are shipped right back to HDFS-based scratch-directories, for job submission.
For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping HDFS-based resources, and add those directly to the Tez session.
We have a patch that's being used internally at Yahoo.
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-17974 If the job resource jar already exists in the HDFS fileSystem, do not upload!
- Patch Available
- relates to
-
HIVE-27737 Consider extending HIVE-17574 to aux jars
- Open