Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24813

ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination

    XMLWordPrintableJSON

Details

    Description

      Following investigations on the issue described by elserj on HBASE-24779, we found out that once a peer is removed, thus killing peers related ReplicationSource instance, it may leave ReplicationSourceManager.totalBufferUsed inconsistent. This can happen if ReplicationSourceWALReader had put some entries on its queue to be processed by ReplicationSourceShipper, but the peer removal killed the shipper before it could process the pending entries. When ReplicationSourceWALReader thread add entries to the queue, it increments ReplicationSourceManager.totalBufferUsed with the sum of the entries sizes. When those entries are read by ReplicationSourceShipper, ReplicationSourceManager.totalBufferUsed is then decreased. We should also decrease ReplicationSourceManager.totalBufferUsed when ReplicationSource is terminated, otherwise those unprocessed entries size would be consuming ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets restarted. This may be a problem for deployments with multiple peers, or if new peers are added.*

      Attachments

        1. image-2020-10-09-10-50-00-372.png
          18 kB
          Sun Xin
        2. TestReplicationSyncUpTool.log
          29.12 MB
          Duo Zhang

        Issue Links

          Activity

            People

              wchevreuil Wellington Chevreuil
              wchevreuil Wellington Chevreuil
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: