Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1407

NettyTransceiver can cause a infinite loop when slow to connect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.5, 1.7.6
    • 1.8.0, 1.9.0
    • java
    • None

    Description

      When a new NettyTransceiver is created it forces the channel to be allocated and connected to the remote host. it waits for the connectTimeout ms on the connect channel future this is obivously a good thing it's only that on being unsuccessful, ie !channelFuture.isSuccess() an exception is thrown and the call to the constructor fails with an IOException, but has the potential to leave a active channel associated with the ChannelFactory

      The problem is that a Netty NioClientSocketChannelFactory will not shutdown if there are active channels still around and if you have supplied the ChannelFactory to the NettyTransceiver then you will not be able to cancel it by calling ChannelFactory.releaseExternalResources() like the Flume Avro RPC client does. In order to recreate this you need a very laggy network, where the connect attempt takes longer than the connect timeout but does actually work, this very hard to organise in a test case, although I do have a test setup using vagrant VM's that recreates this everytime, using the Flume RPC client and server.

      The following stack is from a production system, it won't ever leave recover until the channel is disconnected (by forcing a disconnect at the remote host) or restarting the JVM.

      Production stack trace
      "TLOG-0" daemon prio=10 tid=0x00007f581c7be800 nid=0x39a1 waiting on condition [0x00007f57ef9f2000]
        java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        parking to wait for <0x00000007218b16e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
        at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253)
        at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:103)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.releaseExternalResources(AbstractNioWorkerPool.java:80)
        at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:181)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:142)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:101)
        at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:564)
        locked <0x00000006c30ae7b0> (a org.apache.flume.api.NettyAvroRpcClient)
        at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
        at org.apache.flume.api.LoadBalancingRpcClient.createClient(LoadBalancingRpcClient.java:214)
        at org.apache.flume.api.LoadBalancingRpcClient.getClient(LoadBalancingRpcClient.java:205)
        locked <0x00000006a97b18e8> (a org.apache.flume.api.LoadBalancingRpcClient)
        at org.apache.flume.api.LoadBalancingRpcClient.appendBatch(LoadBalancingRpcClient.java:95)
        at com.ean.platform.components.tlog.client.service.AvroRpcEventRouter$1.call(AvroRpcEventRouter.java:45)
        at com.ean.platform.components.tlog.client.service.AvroRpcEventRouter$1.call(AvroRpcEventRouter.java:43)
      

      The solution is very simple, and a patch should be along in a moment.

      Attachments

        1. AVRO-1407-1.patch
          1 kB
          Gareth Davis
        2. AVRO-1407-testcase.patch
          3 kB
          Gareth Davis
        3. AVRO-1407-2.patch
          1.0 kB
          Gareth Davis

        Activity

          People

            gareth@logicalpractice.com Gareth Davis
            gareth@logicalpractice.com Gareth Davis
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: