Details
-
Improvement
-
Status: Triage Needed
-
Normal
-
Resolution: Unresolved
-
None
-
None
-
All
-
None
Description
Redesign progress mechanisms to be memory efficient, use fewer messages and to resolve dependency chains promptly.
The SimpleProgressLog had a number of problems:
- It polled for progress with no attempt to determine whether progress could realistically be made, so:
- as the number of pending transactions grew, the proportion of useful work dropped (as many would be unable to make progress without earlier transactions completing)
- each transaction in the chain could recover only on average 1/2 poll interval behind the last transaction to complete
- It requested full transaction state from every replica on each attempt
- It maintained a lot of in-memory state
- Polling happened en-masse, allowing for little per-transaction control
We also separately maintained fairly expensive per-command listener state that negatively affected our command loading and caching.
The new DefaultProgressLog makes use of several new features: LocalListeners, RemoteListeners, Timers and Await messages.
- LocalListeners provide a memory-efficient collection for managing each CommandStore’s transaction listeners, with dedicated record keeping for inter-transaction relationships.
- RemoteListeners provide a mechanism for request/response pairs that may be separated by longer than the normal Cassandra message timeout, and require minimal state on sender and recipient. This permits replicas to cheaply update their local state machine as soon as distributed information becomes available.
The DefaultProgressLog tracks each transaction with separate timers to handle per-transaction scheduling, backoff etc, and a succinct state machine. To reduce overhead correspondence is preferentially limited to a handful of replicas, and limited to the home shard where appropriate.