Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17574

Avoid multiple copies of HDFS-based jars when localizing job-jars

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0, 2.4.0, 3.0.0
    • None
    • None
    • None

    Description

      Raising this on behalf of Selina Zhang. (For my own reference: YHIVE-1035.)

      This has to do with the classpaths of Hive actions run from Oozie, and affects scripts that adds jars/resources from HDFS locations.

      As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be stored in HDFS paths, as are any custom user-libraries used in workflows. An ADD JAR|FILE|ARCHIVE statement in a Hive script causes the following steps to occur:

      1. Files are downloaded from HDFS to local temp dir.
      2. UDFs are resolved/validated.
      3. All jars/files, including those just downloaded from HDFS, are shipped right back to HDFS-based scratch-directories, for job submission.

      For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping HDFS-based resources, and add those directly to the Tez session.

      We have a patch that's being used internally at Yahoo.

      Attachments

        1. HIVE-17574.1.patch
          17 kB
          Mithun Radhakrishnan
        2. HIVE-17574.1-branch-2.2.patch
          18 kB
          Mithun Radhakrishnan
        3. HIVE-17574.1-branch-2.patch
          18 kB
          Mithun Radhakrishnan
        4. HIVE-17574.2.patch
          17 kB
          Mithun Radhakrishnan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cdrome Chris Drome Assign to me
            mithun Mithun Radhakrishnan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment