Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26960

Another case for unnecessary replication suspending in RegionReplicationSink

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-2
    • 3.0.0-alpha-3
    • read replicas
    • None

    Description

      Besides HBASE-26768, there is another case replication in RegionReplicationSink would be suspend:
      For RegionReplicationSink, when there is a replication error , RegionReplicationSink invokes MemStoreFlusher#requestFlush to request a flush, and after receiving the FlushAction#START_FLUSH or FlushAction#CANNOT_FLUSH flush marker, it would resume the replication. But when MemStoreFlusher flushing, it invokes following method HRegion.flushcache with the writeFlushRequestWalMarker set to false:

        public FlushResultImpl flushcache(List<byte[]> families,
            boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws IOException {
       }
      

      When writeFlushRequestWalMarker is set to false, HRegion.flushcache does not write the FlushAction#CANNOT_FLUSH flush marker to WAL when the memstore is empty, just as following HRegion.writeFlushRequestMarkerToWAL illustrated:

      private boolean writeFlushRequestMarkerToWAL(WAL wal, boolean writeFlushWalMarker) {
          if (writeFlushWalMarker && wal != null && !writestate.readOnly) {
            FlushDescriptor desc = ProtobufUtil.toFlushDescriptor(FlushAction.CANNOT_FLUSH,
              getRegionInfo(), -1, new TreeMap<>(Bytes.BYTES_COMPARATOR));
            try {
              WALUtil.writeFlushMarker(wal, this.getReplicationScope(), getRegionInfo(), desc, true, mvcc,
                regionReplicationSink.orElse(null));
              return true;
            } catch (IOException e) {
              LOG.warn(getRegionInfo().getEncodedName() + " : " +
                "Received exception while trying to write the flush request to wal", e);
            }
          }
          return false;
        }
      

      so when there is a replication error when the memstore is empty(eg. replicating the FlushAction#START_FLUSH or FlushAction#COMMIT_FLUSH ), the replication may suspend until next memstore flush,even though later there are user writes and it could replicate normally.

      I simulate this problem in the PR , and for writeFlushRequestWalMarker paramter, it is introduced by HBASE-11580 and just only determines whether or not writing the FlushAction#CANNOT_FLUSH flush marker to WAL when the memstore is empty, so I think for simplicity, we could set it to true always for MemStoreFlusher.

      Attachments

        Issue Links

          Activity

            People

              comnetwork chenglei
              comnetwork chenglei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: