Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
when using Hudi to ingest e-commercial company's item data,there are massive update data into old partitions,if one record need update, then the whole file it belongs need rewrite, that result in every commit nearly rewrite the whole table.
I'm thinking if Hudi can provide a hot and cold data separate tool, work with specific column(such as create time and update time) to distinguish hot data and cold data, then rebuild table to separate them into different file groups, after recreate table, the performance will be much better