Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.2.2, 4.3.0
-
None
Description
Scenario:
Step1 : Have three bookies BK1, BK2, BK3
Step2 : Have written ledgers with quorum 2
Step3 : Unfortunately BK2 and BK3 both went down for few moments.
The following logs are flooded in BK1 autorecovery logs. RW is trying to replicate the ledgers, but it simply skip this fragment and moves to next cycle when it sees a replica found in his hand. IMO, we should have a mechanism in place to avoid unnecessary cycles.
2014-02-18 21:47:55,140 - ERROR - [New I/O client boss #2-1:PerChannelBookieClient$1@230] - Could not connect to bookie: [id: 0x00ba679e]/10.18.170.130:15002, current state CONNECTING : java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) 2014-02-18 21:47:55,140 - INFO - 2014-02-18 21:59:33,215 - DEBUG - [ReplicationWorker:ReplicationWorker@182] - Target Bookie[10.18.170.130:15003] found in the fragment ensemble: [10.18.170.130:15003, 10.18.170.130:15001, 10.18.170.130:15002] [ReplicationWorker:PerChannelBookieClient@194] - Connecting to bookie: 10.18.170.130:15002 2014-02-18 21:47:56,162 - ERROR - [New I/O client boss #2-1:PerChannelBookieClient$1@230] - Could not connect to bookie: [id: 0x0003f377]/10.18.170.130:15002, current state CONNECTING : java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) 2014-02-18 21:59:33,215 - DEBUG - [ReplicationWorker:ReplicationWorker@182] - Target Bookie[10.18.170.130:15003] found in the fragment ensemble: [10.18.170.130:15003, 10.18.170.130:15001, 10.18.170.130:15002]
Attachments
Issue Links
- depends upon
-
BOOKKEEPER-749 Improve LedgerUnderreplicationManager#getLedgerToRereplicate()
- Open