Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Spark executes clustering job will read clustering plan which contains multiple groups. Each group process many base files or log files. When we config param `
hoodie.clustering.plan.strategy.sort.columns`, we read those files through spark's parallelize method, every file read will generate one sub task. It's unreasonable.
Attachments
Attachments
Issue Links
- links to