Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2277

Honor oozie.action.sharelib.for.spark in Spark jobs

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 4.3.0
    • None
    • None

    Description

      Shared libraries specified by oozie.action.sharelib.for.spark are not visible in the Spark job itself. For instance, setting oozie.action.sharelib.for.spark to "spark,hcat" will not make the hcat jars usable in the Spark job. This is inconsistent with other actions (such as Java and MapReduce actions).

      Since the Spark action just calls SparkSubmit, it looks like we would need to explicitly pass the jars for the specified sharelibs into the SparkSubmit operation so they are available to the Spark operation itself.

      One option: we can just pass the HDFS URLs to that command via the --jars parameter. This is actually what I've done to work around this issue; it makes for a long SparkSubmit command but works.

      Attachments

        1. OOZIE-2277.004.patch
          28 kB
          Robert Kanter
        2. OOZIE-2277.002.patch
          22 kB
          Robert Kanter
        3. OOZIE-2277.001.patch
          3 kB
          Robert Kanter

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rkanter Robert Kanter
            rbrush Ryan Brush
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment