Description
Currently we replay the logs one at a time causing us to build large queue of pending takes. Additionally, there maybe scenerios where this simply will not work. Take a queue which is full (via checkpoint) and two files:
1:
put
commit
put
commit
2:
take
commit
take
commit
take
commit
Replaying these logs in the current form will not work because we will we try and reply the puts first and exceed our queue size. For these reasons, we should replay them in the order they were written.
However, at present there is no way to do this. Currently we have two identifers in each record we write, a transaction id and a timestamp. Neither can be used in replaying logs in order because the transaction id is created when we create the transaction not when we write to the log. Someone could create transaction, sleep, and then do work. The timestamp its not granular enough as we could have duplicates.
Attachments
Attachments
Issue Links
- supercedes
-
FLUME-1431 FileChannel file format has no regression tests
- Resolved
- links to