Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25920

Avoid custom processing of CLI options for cluster submission

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: Spark Submit
    • Labels:
      None

      Description

      In SparkSubmit, when an app is being submitted in cluster mode, there is currently a lot of code specific to each resource manager to take the SparkSubmit internals, package them up in a rm-specific set of "command line options", and parse them back into memory when the rm-specific class is invoked.

      e.g. for YARN

          // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
          if (isYarnCluster) {
            childMainClass = YARN_CLUSTER_SUBMIT_CLASS
            if (args.isPython) {
              childArgs += ("--primary-py-file", args.primaryResource)
              childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
        [blah blah blah]
      

      For Mesos:

          if (isMesosCluster) {
            assert(args.useRest, "Mesos cluster mode is only supported through the REST submission API")
            childMainClass = REST_CLUSTER_SUBMIT_CLASS
            if (args.isPython) {
              // Second argument is main class
              childArgs += (args.primaryResource, "")
              if (args.pyFiles != null) {
                sparkConf.set("spark.submit.pyFiles", args.pyFiles)
              }
        [blah blah blah]
      

      For k8s:

          if (isKubernetesCluster) {
            childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
            if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
              if (args.isPython) {
                childArgs ++= Array("--primary-py-file", args.primaryResource)
                childArgs ++= Array("--main-class", "org.apache.spark.deploy.PythonRunner")
        [blah blah blah]
      

      These parts of the code are all very similar and there's not a good reason for why each RM needs specific processing here. We should try to simplify all this stuff and pass pre-parsed command line options to the cluster submission classes.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vanzin Marcelo Masiero Vanzin
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: