Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1825

Windows Spark fails to work with Linux YARN

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.3.0
    • YARN
    • None

    Description

      Windows Spark fails to work with Linux YARN.
      This is a cross-platform problem.

      This error occurs when 'yarn-client' mode is used.
      (yarn-cluster/yarn-standalone mode was not tested.)

      On YARN side, Hadoop 2.4.0 resolved the issue as follows:
      https://issues.apache.org/jira/browse/YARN-1824

      But Spark YARN module does not incorporate the new YARN API yet, so problem persists for Spark.

      First, the following source files should be changed:

      • /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
      • /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala

      Change is as follows:

      • Replace .$() to .$$()
      • Replace File.pathSeparator for Environment.CLASSPATH.name to ApplicationConstants.CLASS_PATH_SEPARATOR (import org.apache.hadoop.yarn.api.ApplicationConstants is required for this)

      Unless the above are applied, launch_container.sh will contain invalid shell script statements(since they will contain Windows-specific separators), and job will fail.
      Also, the following symptom should also be fixed (I could not find the relevant source code):

      • SPARK_HOME environment variable is copied straight to launch_container.sh. It should be changed to the path format for the server OS, or, the better, a separate environment variable or a configuration variable should be created.
      • '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after the above change is applied. maybe I missed a few lines.

      I'm not sure whether this is all, since I'm new to both Spark and YARN.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tsudukim Masayoshi Tsuzuki
            zeodtr Taeyun Kim
            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment