RB does not work for me it seems.
I am attaching an updated and rebased patch here instead. I'll put it up to github or Phabricator if RB is not fixed.
A bit of explanation for the patch is needed.
When the primary region server executes flushes, compactions and opens / closes the region, it is written to WAL (
HBASE-11511, HBASE-11512, HBASE-2231) and these event edits are also being sent using the async wal replication using HBASE-11568 to the secondary region replica.
In this patch, we replay flushes from primary region replica by creating a corresponding snapshot with the same seqId from the primary. On replaying the flush commit marker, we find out the previous flush prepare and drop the memstore snapshots and pick up the new files.
Region open events are used as coordination points, where all the region's flush files are written to the WAL when primary opens it. Replaying that event on the secondary will also pick those files up, and clean the memstore states accordingly (see patch).
Since replication can deliver events out of order, region open event markers are used so that whenever we replay a region open event, we skip all previous entries so that flush / compaction markers from previous seqId's do not mess up the state.
Some earlier discussions about issues related to this patch can also be found at the design doc at
The meat of the changes are in HRegion. We have split the internalFlushCache() method into two. The first part does prepare, second part does flush and commit. These two parts are used from replaying flush start and flush commit events coming from the primary region replica. Some changes are there for handling the seqId's from the replayed WAL edits. Since the secondaries track the primary with the same seqId, they also track their sequenceId atomicLong and also memstore read points. Finally, TestHRegionReplayEvents.java adds a lot of tests for handling different scenarios (see comment in the test).