Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.1, 3.1.1
-
None
-
None
Description
Example: we have 2 applications A and B. These applications have some common Spark settings and some application-specific settings. The idea is to run them like this:
spark-submit --properties-files common.properties,a.properties A spark-submit --properties-files common.properties,b.properties B
Benefits:
- Common settings can be extracted to a common file common.properties, no need to copy them over a.properties and b.properties
- Applications can override common settings in their respective custom properties files
Currently the following mechanism works in SparkSubmitArguments.scala: console arguments like --conf key=value overwrite settings in the properties file. This is not enough, because console arguments should be specified in the launcher script; de-facto they belong to the binary distribution rather than the configuration.
Consider the following scenario: Spark on Kubernetes, the configuration is provided as a ConfigMap. We could have the following ConfigMaps:
- a.properties // mount to the Pod with application A
- b.properties // mount to the Pod with application B
- common.properties // mount to both Pods with A and B
Meanwhile the launcher script app-submit.sh is the same for both applications A and B, since it contains none configuration settings:
spark-submit --properties-files common.properties,${app_name}.properties ...
Alternate solution
Use Typesafe Config for Spark settings instead of properties files. Typesafe Config allows including files.
For example, settings for the application A - a.conf:
include required("common.conf") spark.sql.shuffle.partitions = 240