Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.4, 2.5.0
Description
Following investigations on the issue described by elserj on HBASE-24779, we found out that once a peer is removed, thus killing peers related ReplicationSource instance, it may leave ReplicationSourceManager.totalBufferUsed inconsistent. This can happen if ReplicationSourceWALReader had put some entries on its queue to be processed by ReplicationSourceShipper, but the peer removal killed the shipper before it could process the pending entries. When ReplicationSourceWALReader thread add entries to the queue, it increments ReplicationSourceManager.totalBufferUsed with the sum of the entries sizes. When those entries are read by ReplicationSourceShipper, ReplicationSourceManager.totalBufferUsed is then decreased. We should also decrease ReplicationSourceManager.totalBufferUsed when ReplicationSource is terminated, otherwise those unprocessed entries size would be consuming ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets restarted. This may be a problem for deployments with multiple peers, or if new peers are added.*
Attachments
Attachments
Issue Links
- breaks
-
HBASE-25117 ReplicationSourceShipper thread can not be finished
- Resolved
- is related to
-
HBASE-24779 Improve insight into replication WAL readers hung on checkQuota
- Resolved
- links to