Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.7.1, 2.4.5
-
None
-
Reviewed
Description
void preLogRoll(Path newLog) throws IOException { recordLog(newLog); String logName = newLog.getName(); String logPrefix = DefaultWALProvider.getWALPrefixFromWALName(logName); synchronized (latestPaths) { Iterator<Path> iterator = latestPaths.iterator(); while (iterator.hasNext()) { Path path = iterator.next(); if (path.getName().contains(logPrefix)) { iterator.remove(); break; } } this.latestPaths.add(newLog); } }
ReplicationSourceManager use latestPaths to track each walgroup's last WALlog and all of them will be enqueue for replication when new replication peer added。
If we set hbase.wal.regiongrouping.numgroups > 10, says 12, the name of WALlog group will be regionserver.null0.timestamp to regionserver.null11.timestamp。String.contains is used in preoLogRoll to replace old logs in same group, leads when regionserver.null1.ts comes, regionserver.null11.ts may be replaced, and latestPaths growing with wrong logs.
Replication then partly stuckd as regionsserver.null1.ts not exists on hdfs, and data may not be replicated to slave as regionserver.null11.ts not in replication queue at startup.
Because of ZOOKEEPER-706, if there is too many logs in zk /hbase/replication/rs/regionserver/peer, remove_peer may not delete this znode, and other regionserver can't not pick up this queue for replication failover.
Attachments
Issue Links
- links to