Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.6
-
None
Description
A clustered broker maintains consistency of replicated objects by only modifying them in a "replication safe" thread context: while receiving an update or dispatching cluster events.
A repeated source of cluster bugs is broker code that unwittingly modifies replicated objects in an unsafe context such as a timer thread. These bugs are intermittent race conditions that are hard to track down.
Proposal: annotate broker code with assertions to identify code that modifies replicated state and log/abort if such code is called in an unsafe context:
// New class:
namespace broker {
class Replicated {
protected:
void assertReplicationSafe();
}
// Existing classes
class Queue : public Replicated { // Mark Queue as state that may be replicated.
void someQueueModifier()
The assertion is cheap: just testing a thread-local boolean value. In a non-clustered broker it does nothing.
This technique has already proven valuable in debugging a recent bug, putting the assertions permanently in the code should speed debugging of future bugs.
This would be the beginning of a formal contract between the broker code and the cluster that should make things more maintainable in the long run.