My thoughts on Flow to disk
1) The easiest approach to start with is to set capacity limits on queues such that if the total size of messages on a queue exceeds the capacity the messages at the tail of the queue start flowing to disk rather than remaining in memory. To ensure that the broker itself never runs out of memory the total capacity of all queues must not exceed the total capacity of the broker. One can start to get clever here by progressively reducing the capacity of individual queues as the sum of the individual capacities approaches the total capacity.
2) I strongly believe that we should not confuse the transactional message log (which is used to replay persistent messages after a broker failure) with the random(ish) access store. In particular the current implementation where we try to piggyback flow-to-disk for persistent messages on top of our transaction log is a cause of constant issues for us. It is better (I think) to have a uniform solution across transient and persistent messages for flow-to-disk; and for this to be separated completely from the transaction log.
3) We should not overcomplicate things for ourselves by trying to be exact in our calculations of memory used. In particular where a message is placed on many queues we should not try to account for the fact that it need be stored only once in memory. Reasoning about memory usage is much easier if we make the simplifying assumption that each message is duplicated on each queue.
4) Discarding transient messages, killing slow consumers, throttling publishers etc are all separate strategies for dealing with expanding queue size and should be considered separately from flow to disk (they are not alternatives since you may wish to combine all these strategies).
5) In the first instance we should only consider flowing message header and content to disk, not the structure of the queue itself. This will put an upper bound on the number of messages that can logically be in the broker at any one time - and therefore without implementing other strategies (such as producer throttling) it will still theoretically be possible to make the broker run out of memory. Flowing the queue structure to disk should be part of a second phase of implementation.
[What do I mean by this? That in the first instance we should keep in memory the list of entries, and the entries may point to an in-memory data structure, or an on-disk data structure. In the second phase the list itself may be partly stored on disk.]
6) The method of storing message data on disk does not need to be complicated. Data that has been flowed to disk can be deleted on startup (since the transaction log is what preserves durability). Something as simple as a directory per queue, with a file for each message (or separate files for the header and each body chunk) will be sufficient. The file names can be derived from the message id, directory names from the queue name.
So implementation-wise I would suggest the following:
1) Remove shared state from the Message class, and move everything into QueueEntries (this allows for a message to be flowed to disk on one queue while staying in memory on another).
2) Break apart the interface for the store such that it covers only the logging of the durable transactional events (message data arriving, enqueues, dequeues).
3) Move reference counting into the store/store facade
- At this point we will have removed our current (fragile) flow-to-disk capabilities on persistent messages... and all messages will be held in memory while live **
4) Add a capacity property to queues
5) Update queue entries / message handles so that a queue entry can be explicitly flowed-to-disk, have "flowed-to-disk" as a boolean property, and add the ability to restore a queue entry from disk
6) Add capabilities to queues to shrink their in memory profile by flowing queue-entries to disk (from the tail upward) until they are under a given notional memory usage
Then you can start to examine ways of making the broker more dynamically actively manage the memory usage of queues
hope this helps,