[HUDI-4070] Better Spark SQL default configs - ASF JIRA

XML

Word

Printable

JSON

Default configs should be:

Optimized for insert/bulk_insert e.g. by default if we have NONE sort mode then it's as good as parquet writes with some additional work for meta columns. An extension of this is to keep a map of minimal optimized configs per operation type. This is partly related to better performant configs ~~HUDI-2151~~
Make reasonable assumptions, e.g. for index type, bloom filter does not rely on any external system, so it can be a better default candidate than let's say HBase index.
Scout all configs with noDefaultValue and assign a default if necessary.
Keep spark-sql and spark datasource config keys same as much as possible, otherwise it's difficult operationally for the user. Rename/reuse existing datasource keys that are meant for same purpose. This is related to ~~HUDI-4071~~ as well.

is related to

HUDI-2151 Make performant out-of-box configs

relates to

HUDI-4071 Better Spark Datasource default configs