Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-2
-
None
Description
Besides HBASE-26768, there is another case replication in RegionReplicationSink would be suspend:
For RegionReplicationSink, when there is a replication error , RegionReplicationSink invokes MemStoreFlusher#requestFlush to request a flush, and after receiving the FlushAction#START_FLUSH or FlushAction#CANNOT_FLUSH flush marker, it would resume the replication. But when MemStoreFlusher flushing, it invokes following method HRegion.flushcache with the writeFlushRequestWalMarker set to false:
public FlushResultImpl flushcache(List<byte[]> families, boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws IOException { }
When writeFlushRequestWalMarker is set to false, HRegion.flushcache does not write the FlushAction#CANNOT_FLUSH flush marker to WAL when the memstore is empty, just as following HRegion.writeFlushRequestMarkerToWAL illustrated:
private boolean writeFlushRequestMarkerToWAL(WAL wal, boolean writeFlushWalMarker) { if (writeFlushWalMarker && wal != null && !writestate.readOnly) { FlushDescriptor desc = ProtobufUtil.toFlushDescriptor(FlushAction.CANNOT_FLUSH, getRegionInfo(), -1, new TreeMap<>(Bytes.BYTES_COMPARATOR)); try { WALUtil.writeFlushMarker(wal, this.getReplicationScope(), getRegionInfo(), desc, true, mvcc, regionReplicationSink.orElse(null)); return true; } catch (IOException e) { LOG.warn(getRegionInfo().getEncodedName() + " : " + "Received exception while trying to write the flush request to wal", e); } } return false; }
so when there is a replication error when the memstore is empty(eg. replicating the FlushAction#START_FLUSH or FlushAction#COMMIT_FLUSH ), the replication may suspend until next memstore flush,even though later there are user writes and it could replicate normally.
I simulate this problem in the PR , and for writeFlushRequestWalMarker paramter, it is introduced by HBASE-11580 and just only determines whether or not writing the FlushAction#CANNOT_FLUSH flush marker to WAL when the memstore is empty, so I think for simplicity, we could set it to true always for MemStoreFlusher.
Attachments
Issue Links
- relates to
-
HBASE-11580 Failover handling for secondary region replicas
- Closed
-
HBASE-26233 The region replication framework should not be built upon the general replication framework
- Resolved
- links to