Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
As part of https://github.com/apache/hudi/pull/12033, we fixed an issue where log record reader was missing to read a data block in some edge cases.
The fix ensured log record reader will account for all rollback blocks dis-regarding the max instant time configured while reading log record reader.
But lets also follow through to see if we can fix all callers to set the right value for the max instant time.
Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
Current layout is,
base file(t1), lf1(partially committed data w/ t2 as instant time)
Then we start t5.dc say. just when we start t5.dc, hudi detects pending commit and triggers a rollback. And this rollback will get an instant time of t6 (t6.rb). Note that rollback's commit time is greater than t5 or current ongoing delta commit.
So, once rollback completes, this is the layout.
base file, lf1(from t2.dc partially failed), lf3 (rollback command block with t6).
And once t5.dc completes, this is how the layout looks like
base file, lf1(from t2.dc partially failed), lf3 (rollback command block with t6). lf4 (from t5)
Callers involved:
- This affects global indexes (simple, bloom) by not applying deletes. Non-global we read base files.. and with only updates in the log, it does not affect the tagging for non-global (bloom/simple).
- Once there is a new commit, snapshot queries will start returning lf4. (almost eventually consistent behavior)
- - spark does not factor RBs in latestInstantTime..
- hive/trino/presto if they all use inputFormat BaseHoodieFileIndex#getLatestCompletedInstant handles this.
- Flink (FormatUtils is not handling this).
- CDC: Also has issues. Irrespective of whether end instant time is set by the user or not.
- Incremental queries : Just fixing lastInstant time alone may not suffice. since the instant time might be set by the user. So, we might have to remove "break" from within logRecordReader.
- what about indexing? all new indexes added in 1.x
- if clustering is scheduled, right after this. (or) executed inline right after this ➝ this is not an issue since clustering passes in its own instant time as latestInstantTime, passing the check and exposing lf4.
- if compaction is scheduled, right after this (or) executed inline right after this ➝ this accounts by taking into account the rollback when passing lastInstantTime that includes rollback ts.