Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2547

Add mapreduce.job.cache.files to spark action

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 4.3.0
    • None
    • None

    Description

      Currently, we pass jars using --jars option while submitting spark job. Also, we add spark.yarn.dist.files option in case of yarn-client mode.
      Instead of that, we can have only --files option and pass on the files which are present in mapreduce.job.cache.files. While doing so, we make sure that spark won't make another copy of the files if files exist on the hdfs. We saw the issues when files are getting copied multiple times and causing exceptions such as :

      Diagnostics: Resource hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar changed on src filesystem
      

      Attachments

        1. yarn-cluster_launcher.txt
          238 kB
          Robert Kanter
        2. OOZIE-2547-5.patch
          18 kB
          Satish Saley
        3. OOZIE-2547-4.patch
          18 kB
          Satish Saley
        4. OOZIE-2547-1.patch
          13 kB
          Satish Saley

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            satishsaley Satish Saley
            satishsaley Satish Saley
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment