Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1793

Drill statements fail with CuratorConnectionLossException / FragmentSetupException

    XMLWordPrintableJSON

Details

    Description

      CTAS statements on about 10TB of data (TPCH) fail towards the end after majority of data conversion is done.

      The following error is seen in Sqlline / Drillbit.out

      10:32:07.286 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.10.104.101:5181,10.10.104.102:5181,10.10.104.103:5181) and timeout (5000) / elapsed (5001)
      org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
      at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.5.0.jar:na]
      at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.5.0.jar:na]
      at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.5.0.jar:na]
      at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807) [curator-framework-2.5.0.jar:na]
      at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793) [curator-framework-2.5.0.jar:na]
      at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) [curator-framework-2.5.0.jar:na]
      at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) [curator-framework-2.5.0.jar:na]
      at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_71]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
      at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

      The following error was seen in Drillbit.log couple of minutes before this error was seen on sqlline/drillbit.out:

      2014-12-01 13:34:06,447 [BitServer-8] ERROR o.a.drill.exec.rpc.data.DataServer - Failure while getting fragment manager. 4c80fb01-8d5e-4a6d-8f0f-5943d6ceb74f:7:564
      org.apache.drill.exec.exception.FragmentSetupException: Failed to receive plan fragment that was required for id: 4c80fb01-8d5e-4a6d-8f0f-5943d6ceb74f:7:564
      at org.apache.drill.exec.rpc.control.WorkEventBus.getFragmentManager(WorkEventBus.java:107) ~[drill-java-exec-0.6.0.r2-incubating-SNAPSHOT-rebuffed.jar:0.6.0.r2-incubating-SNAPSHOT]
      at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:111) [drill-java-exec-0.6.0.r2-incubating-SNAPSHOT-rebuffed.jar:0.6.0.r2-incubating-SNAPSHOT]
      at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:48) [drill-java-exec-0.6.0.r2-incubating-SNAPSHOT-rebuffed.jar:0.6.0.r2-incubating-SNAPSHOT]
      at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194) [drill-java-exec-0.6.0.r2-incubating-SNAPSHOT-rebuffed.jar:0.6.0.r2-incubating-SNAPSHOT]
      at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173) [drill-java-exec-0.6.0.r2-incubating-SNAPSHOT-rebuffed.jar:0.6.0.r2-incubating-SNAPSHOT]
      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161) [netty-codec-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.24.Final.jar:4.0.24.Final]
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [netty-common-4.0.24.Final.jar:4.0.24.Final]
      at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
      2014-12-01 13:34:06,519 [BitServer-6] ERROR o.a.d.exec.rpc.control.WorkEventBus - A fragment message arrived but there was no registered listener for that message for handle query_id {
      part1: 5512681927986793069
      part2: -8138187853733644465
      }
      major_fragment_id: 2
      minor_fragment_id: 424
      .

      Log files have been attached.

      Attachments

        1. drillbit.out.log
          1.47 MB
          Abhishek Girish
        2. drillbit.log
          4.54 MB
          Abhishek Girish

        Activity

          People

            sphillips Steven Phillips
            agirish Abhishek Girish
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: