We've had various efforts to improve the ordering guarantees for HBase replication, most notably Serial Replication.
I think in many cases guaranteeing a Total Replication Order is not required, but a simpler Causal Replication Order is sufficient.
Specifically we would guarantee causal ordering for a single Rowkey. Any changes to a Row - Puts, Deletes, etc - would be replicated in the exact order in which they occurred in the source system.
Unlike total ordering this can be accomplished with only local region server control.
I don't have a full design in mind, let's discuss here. It should be sufficient to to the following:
- RegionServers only adopt the replication queues from other RegionServers for regions they (now) own. This requires log splitting for replication.
- RegionServers ship all edits for queues adopted from other servers before any of their "own" edits are shipped.
It's probably a bit more involved, but should be much cheaper that the total ordering provided by serial replication.