Details
-
New Feature
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
Description
In some cases where we need a flag to measure the progress of data writing, I think it is a reasonable way to store the watermark as an attribute of the hudi commit metadata.
One of our scenarios is that Flink writes data to Hudi table in real time, and then we use this Hudi table to support batch computation, so we need a flag to evaluate whether its partition data is complete.
For example, job1 is scheduled every hour. At 2022-01-19 02:01:00, job1 starts to check whether the partition (20220119/01) of hudi_table1 is completed (Flink writes data to hudi_table1 in real time). When the watermark properties of hudi_table1‘s commit metadata are higher than 2022- 01-19 02:05:00 Update (5 minutes out of order), we consider partition(20220119/01) as completed and we can safely execute Hive or Flink sql for batch computation. (basically insert table2 select xx from hudi_table1...)
Attachments
Attachments
Issue Links
- links to