Details
-
Task
-
Status: Closed
-
Blocker
-
Resolution: Duplicate
-
None
-
None
-
None
-
2
Description
While working on https://github.com/apache/hudi/pull/4556, I have stumbled upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after performing Restore operation.
The root-cause of these turned out to be Metadata Table storing incorrect sizes of the files after Restore (sizes in MT are essentially 2x of what is in FS):
This seems to occur due to following:
- Metadata table treats new Records for the same file as "deltas", appending the file-size to its records (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java#L227)]
- Upon Restore (which is handled simply as a collection of Rollbacks) we pick max of the sizes of the files before and after the operation, not regarding to which we're actually rolling back to (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L254).]
Proposal
Instead of simply always picking the max size, we should pick the size of the file as it was right before.