As a Spark user I want to be able to customize my spark session. I currently want to be able to do the following things:
- I want to be able to add custom analyzer rules. This allows me to implement my own logical constructs; an example of this could be a recursive operator.
- I want to be able to add my own analysis checks. This allows me to catch problems with spark plans early on. An example of this can be some datasource specific checks.
- I want to be able to add my own optimizations. This allows me to optimize plans in different ways, for instance when you use a very different cluster (for example a one-node X1 instance). This supersedes the current spark.experimental methods
- I want to be able to add my own planning strategies. This supersedes the current spark.experimental methods. This allows me to plan my own physical plan, an example of this would to plan my own heavily integrated data source (CarbonData for example).
- I want to be able to use my own customized SQL constructs. An example of this would supporting my own dialect, or be able to add constructs to the current SQL language. I should not have to implement a complete parse, and should be able to delegate to an underlying parser.
- I want to be able to track modifications and calls to the external catalog. I want this API to be stable. This allows me to do synchronize with other systems.
This API should modify the SparkSession when the session gets started, and it should NOT change the session in flight.