Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I was trying out spark streaming sink w/ hudi and saw warn logs as below.
22/04/09 15:54:16 WARN AbstractTableFileSystemView: Could not read commit details from /tmp/hudi_streaming_kafka/COPY_ON_WRITE/.hoodie/20220409154917240.replacecommit 22/04/09 15:54:16 WARN AbstractTableFileSystemView: Could not read commit details from /tmp/hudi_streaming_kafka/COPY_ON_WRITE/.hoodie/20220409155011647.replacecommit
But ran some validations and ensured data was intact. Further investigation revealed that, this happens just after archival, where in the replace commit shown above were part of the list of instants that got archived. So, may be active timeline reloading is missed somewhere. Since its a warn log and does not cause any correctness issue, filing a low priority ticket.
Steps to repo:
spark streaming write to Hudi COW table w/ async clustering. make archival aggressive and you should see these logs at some point