Uploaded image for project: 'Bookkeeper'
  1. Bookkeeper
  2. BOOKKEEPER-733

Improve ReplicationWorker to handle the urLedgers which already have same leder replica in hand

    XMLWordPrintableJSON

Details

    Description

      Scenario:

      Step1 : Have three bookies BK1, BK2, BK3
      Step2 : Have written ledgers with quorum 2
      Step3 : Unfortunately BK2 and BK3 both went down for few moments.

      The following logs are flooded in BK1 autorecovery logs. RW is trying to replicate the ledgers, but it simply skip this fragment and moves to next cycle when it sees a replica found in his hand. IMO, we should have a mechanism in place to avoid unnecessary cycles.

      2014-02-18 21:47:55,140 - ERROR - [New I/O client boss #2-1:PerChannelBookieClient$1@230] - Could not connect to bookie: [id: 0x00ba679e]/10.18.170.130:15002, current state CONNECTING : 
      java.net.ConnectException: Connection refused: no further information
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
      	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
      	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
      	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
      	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
      	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
      	at java.lang.Thread.run(Thread.java:619)
      2014-02-18 21:47:55,140 - INFO  - 2014-02-18 21:59:33,215 - DEBUG  - [ReplicationWorker:ReplicationWorker@182] - Target Bookie[10.18.170.130:15003] found in the fragment ensemble: [10.18.170.130:15003, 10.18.170.130:15001, 10.18.170.130:15002]
      [ReplicationWorker:PerChannelBookieClient@194] - Connecting to bookie: 10.18.170.130:15002
      2014-02-18 21:47:56,162 - ERROR - [New I/O client boss #2-1:PerChannelBookieClient$1@230] - Could not connect to bookie: [id: 0x0003f377]/10.18.170.130:15002, current state CONNECTING : 
      java.net.ConnectException: Connection refused: no further information
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
      	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
      	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
      	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
      	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
      	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
      	at java.lang.Thread.run(Thread.java:619)
      2014-02-18 21:59:33,215 - DEBUG  - [ReplicationWorker:ReplicationWorker@182] - Target Bookie[10.18.170.130:15003] found in the fragment ensemble: [10.18.170.130:15003, 10.18.170.130:15001, 10.18.170.130:15002]
      
      

      Attachments

        Issue Links

          Activity

            People

              rakeshr Rakesh Radhakrishnan
              rakeshr Rakesh Radhakrishnan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: