[HBASE-24813] ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.4, 2.5.0
Fix Version/s: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1
Component/s: Replication
Labels:
- pull-request-available

Description

Following investigations on the issue described by elserj on ~~HBASE-24779~~, we found out that once a peer is removed, thus killing peers related ReplicationSource instance, it may leave ReplicationSourceManager.totalBufferUsed inconsistent. This can happen if ReplicationSourceWALReader had put some entries on its queue to be processed by ReplicationSourceShipper, but the peer removal killed the shipper before it could process the pending entries. When ReplicationSourceWALReader thread add entries to the queue, it increments ReplicationSourceManager.totalBufferUsed with the sum of the entries sizes. When those entries are read by ReplicationSourceShipper, ReplicationSourceManager.totalBufferUsed is then decreased. We should also decrease ReplicationSourceManager.totalBufferUsed when ReplicationSource is terminated, otherwise those unprocessed entries size would be consuming ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets restarted. This may be a problem for deployments with multiple peers, or if new peers are added.*