Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.0
-
None
Description
FileStreamSink.addBatch checks the latest batch ID before writing outputs to skip writing batch if the batch was committed before.
While it's valid to compare the current batch with the latest batch ID, getLatest() method is designed to return both the batch ID as well as content which denotes that the latest metadata log file is being read and deserialized. This would introduces heavy latency when the latest batch is a compacted batch.
We could just find the metadata log file for latest batch ID, and only do the minimal check without reading content.
Attachments
Issue Links
- links to