Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
See https://github.com/apache/hudi/pull/7912/files for more details
Note that this ticket may be split into separate improvement areas for clarify.
Below are tracked in separate tickets:
CLEANER_COMMITS_RETAINED: when either "hoodie.cleaner.commits.retained", "hoodie.cleaner.hours.retained", or "hoodie.cleaner.fileversions.retained" is set, should we automatically use the corresponding clean policy?
Clustering around group size: PLAN_STRATEGY_MAX_BYTES_PER_OUTPUT_FILEGROUP
LAYOUT_TYPE
ORDERING_FIELD (PAYLOAD_ORDERING_FIELD_PROP_KEY)
PAYLOAD_CLASS_NAME ("hoodie.compaction.payload.class")
KEYGENERATOR_TYPE (auto inference)
Low ROI which can be punt:
PLAN_STRATEGY_CLASS_NAME to enum
MERGE_ALLOW_DUPLICATE_ON_INSERTS_ENABLE
EQUALITY_SQL_QUERIES
These should be untouched:
EMBEDDED_TIMELINE_SERVER_REUSE_ENABLED (Flink may still need this)
DYNAMODB_ENDPOINT_URL (this is inDynamoDbBasedLockConfig and we only use the config key. There is no ROI for adding infer logic.)
AWS_ACCESS_KEY, AWS_SECRET_KEY (These configs should only be necessary if no environmental variables are not set (to confirm with code); no action should be needed on changing the code)