Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
File Channel has scaled so well that people now run channels with sizes in 100's of millions of events. Turns out, replay can be crazy slow even between checkpoints at this scale - because of the remove() method in FlumeEventQueue moving every pointer that follows the one being removed (1 remove causes 99 million+ moves for a channel of 100 million!). There are several ways of improving - one being move at the end of replay - sort of like a compaction. Another is to use the fact that all removes happen from the top of the queue, so move the first "k" events out to hashset and remove from there - we can find k using the write id of the last checkpoint and the current one.
Attachments
Attachments
Issue Links
- is related to
-
FLUME-2118 Occasional multi-hour pauses in file channel replay
-
- Resolved
-
-
FLUME-2260 Recommend Dual Checkpoints in file channel documentation
-
- Open
-
- links to