Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
-
None
Description
Default configs should be:
- Optimized for insert/bulk_insert e.g. by default if we have NONE sort mode then it's as good as parquet writes with some additional work for meta columns. An extension of this is to keep a map of minimal optimized configs per operation type. This is partly related to better performant configs
HUDI-2151 - Make reasonable assumptions, e.g. for index type, bloom filter does not rely on any external system, so it can be a better default candidate than let's say HBase index.
- Scout all configs with noDefaultValue and assign a default if necessary.
- Keep spark-sql and spark datasource config keys same as much as possible, otherwise it's difficult operationally for the user. Rename/reuse existing datasource keys that are meant for same purpose. This is related to
HUDI-4071as well.