Description
Was looking at the SparkPipeline constructor API and was trying to maximize the number of settings I'd inherit when a Spark job was submitted with "spark-submit". This should populate the SparkContext (and JavaSparkContext) with values like the Spark Master. If you want to:
- Specify a driver class
- Hadoop Configuration (vs picking up the defaults)
- Inherit pre-populated SparkContext you'd have to use a constructor like:
JavaSparkContext sc = new JavaSparkContext(new SparkConf); new SparkPipeline(sc.master(), sc.appName(), Driver.class, conf)
Just for convenience we could add a constructor like the following:
public SparkPipeline(JavaSparkContext sc, String appName, Class driver, Configuration conf)
Could remove the appName but since the spark context is not guaranteed to be non-null we might get a NPE. This also means that on this line[1] we could throw an NPE when trying to pull the hadoopConfiguration() off that object.