Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
Description
Refactoring the base class (org.apache.sedona.sql.TestBaseScala) to use a method instead of a class-level variable for sparkSession can be a good idea for several reasons:
- Lazy Initialization: Using a method allows for lazy initialization, which can be beneficial if the creation of the SparkSession is resource-intensive or if it should only be created when needed.
- Flexibility: It provides more flexibility for derived classes to customize or extend the initialization logic without having to override a class-level variable.
- Testability: It can improve testability by allowing the SparkSession to be created in a controlled manner, which can be useful for unit tests.
An example is as followings:
trait SparkSessionBuilder { protected val warehouseLocation: String protected val resourceFolder: String def createSparkSession(enableBroadcastJoin: Boolean, setInference: Boolean, enableMetrics: Boolean): SparkSession = { val builder = SedonaContext.builder() .master("local[*]") .appName("sedonasqlScalaTest") .config("spark.sql.warehouse.dir", warehouseLocation) if (enableBroadcastJoin) { builder.config("sedona.join.autoBroadcastJoinThreshold", "-1") } if (setInference) { builder.config("spark.kryoserializer.buffer.max", "64m") .config("spark.wherobots.inference.entrance", resourceFolder + "python/udfEntrance.py") .config("spark.wherobots.inference.files", resourceFolder + "python/udfDefinition.py") .config("spark.wherobots.inference.args", "3") } if (enableMetrics) { builder.config("spark.metrics.conf.*.sink.console.class", "org.apache.spark.metrics.sink.ConsoleSink") } builder.getOrCreate() } }
Attachments
Issue Links
- links to