Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34345

Allow several properties files

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.1, 3.1.1
    • Fix Version/s: None
    • Component/s: Kubernetes, Spark Submit
    • Labels:
      None

      Description

      Example: we have 2 applications A and B. These applications have some common Spark settings and some application-specific settings. The idea is to run them like this:

      spark-submit --properties-files common.properties,a.properties A
      spark-submit --properties-files common.properties,b.properties B
      

      Benefits:

      • Common settings can be extracted to a common file common.properties, no need to copy them over a.properties and b.properties
      • Applications can override common settings in their respective custom properties files

      Currently the following mechanism works in SparkSubmitArguments.scala: console arguments like --conf key=value overwrite settings in the properties file. This is not enough, because console arguments should be specified in the launcher script; de-facto they belong to the binary distribution rather than the configuration.

      Consider the following scenario: Spark on Kubernetes, the configuration is provided as a ConfigMap. We could have the following ConfigMaps:

      • a.properties // mount to the Pod with application A
      • b.properties // mount to the Pod with application B
      • common.properties // mount to both Pods with A and B
        Meanwhile the launcher script app-submit.sh is the same for both applications A and B, since it contains none configuration settings:
      spark-submit --properties-files common.properties,${app_name}.properties ...
      

      Alternate solution

      Use Typesafe Config for Spark settings instead of properties files. Typesafe Config allows including files.
      For example, settings for the application A - a.conf:

      include required("common.conf")
      
      spark.sql.shuffle.partitions = 240
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tashoyan Arseniy Tashoyan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: