Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10687

When adding new node to cluster getting Cassandra timeout during write query

    XMLWordPrintableJSON

    Details

    • Severity:
      Normal
    • Since Version:

      Description

      When adding one new node on 8 nodes cluster (also again after completing adding the 9th in AUS data center and again when adding the 10th node on TAM data center with same behaviour).
      We get many of the following errors below.
      First - why this, when the node is joining :
      LOCAL_ONE (2 replica were required but only 1 acknowledged the write
      Since when LOCAL_ONE requires 2 replicas ?
      Second, why we fill so much overhead on the all cluster, when a node is joining ?

      com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were required but only 1 acknowledged the write)

      Sample stack trace
      …stax.driver.core.exceptions.WriteTimeoutException.copy (WriteTimeoutException.java:73)

      …m.datastax.driver.core.DriverThrowables.propagateCause (DriverThrowables.java:37)

      ….driver.core.DefaultResultSetFuture.getUninterruptibly (DefaultResultSetFuture.java:214)

      com.datastax.driver.core.AbstractSession.execute (AbstractSession.java:52)

      com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:29)
      com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:25)
      com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58)
      com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadOnlyDao.tracking(CassandraPagesReadOnlyDao.scala:19)
      com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao.insertCompressed(CassandraPagesReadWriteDao.scala:25)
      com.wixpress.html.data.distributor.core.DaoPageDistributor.com$wixpress$html$data$distributor$core$DaoPageDistributor$$distributePage(DaoPageDistributor.scala:36)
      com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply$mcV$sp(DaoPageDistributor.scala:26)
      com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26)
      com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26)
      com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58)
      com.wixpress.html.data.distributor.core.DaoPageDistributor.tracking(DaoPageDistributor.scala:17)
      com.wixpress.html.data.distributor.core.DaoPageDistributor.process(DaoPageDistributor.scala:25)
      com.wixpress.html.data.distributor.core.greyhound.DistributionRequestHandler.handleMessage(DistributionRequestHandler.scala:19)
      com.wixpress.greyhound.KafkaUserHandlers.handleMessage(UserHandlers.scala:11)
      com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$handleMessage(EventsConsumer.scala:51)
      com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply$mcV$sp(EventsConsumer.scala:43)
      com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40)
      com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40)
      scala.util.Try$.apply(Try.scala:192)
      com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$dispatch(EventsConsumer.scala:40)
      com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:26)
      com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:25)
      scala.collection.Iterator$class.foreach(Iterator.scala:742)
      scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
      com.wixpress.greyhound.EventsConsumer.consumeEvents(EventsConsumer.scala:25)
      com.wixpress.greyhound.EventsConsumer.run(EventsConsumer.scala:20)
      java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)

      java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)

      java.lang.Thread.run (Thread.java:745)

      caused by com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were required but only 1 acknowledged the write)
      …stax.driver.core.exceptions.WriteTimeoutException.copy (WriteTimeoutException.java:100)

      com.datastax.driver.core.Responses$Error.asException (Responses.java:98)

      com.datastax.driver.core.DefaultResultSetFuture.onSet (DefaultResultSetFuture.java:149)

      com.datastax.driver.core.RequestHandler.setFinalResult (RequestHandler.java:183)

      com.datastax.driver.core.RequestHandler.access$2300 (RequestHandler.java:44)

      …ore.RequestHandler$SpeculativeExecution.setFinalResult (RequestHandler.java:748)

      ….driver.core.RequestHandler$SpeculativeExecution.onSet (RequestHandler.java:587)

      …atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:1013)

      …atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:936)

      ….netty.channel.SimpleChannelInboundHandler.channelRead (SimpleChannelInboundHandler.java:105)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      io.netty.handler.timeout.IdleStateHandler.channelRead (IdleStateHandler.java:254)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      …etty.handler.codec.MessageToMessageDecoder.channelRead (MessageToMessageDecoder.java:103)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      …etty.handler.codec.MessageToMessageDecoder.channelRead (MessageToMessageDecoder.java:103)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      io.netty.handler.codec.ByteToMessageDecoder.channelRead (ByteToMessageDecoder.java:242)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      io.netty.channel.DefaultChannelPipeline.fireChannelRead (DefaultChannelPipeline.java:847)

      ….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read (AbstractNioByteChannel.java:131)

      io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java:511)

      ….channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java:468)

      io.netty.channel.nio.NioEventLoop.processSelectedKeys (NioEventLoop.java:382)

      io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:354)

      ….netty.util.concurrent.SingleThreadEventExecutor$2.run (SingleThreadEventExecutor.java:111)

      java.lang.Thread.run (Thread.java:745)

      caused by com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were required but only 1 acknowledged the write)
      com.datastax.driver.core.Responses$Error$1.decode (Responses.java:57)

      com.datastax.driver.core.Responses$Error$1.decode (Responses.java:37)

      com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:213)

      com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:204)

      …etty.handler.codec.MessageToMessageDecoder.channelRead (MessageToMessageDecoder.java:89)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      …etty.handler.codec.MessageToMessageDecoder.channelRead (MessageToMessageDecoder.java:103)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      io.netty.handler.codec.ByteToMessageDecoder.channelRead (ByteToMessageDecoder.java:242)

      …hannel.AbstractChannelHandlerContext.invokeChannelRead (AbstractChannelHandlerContext.java:339)

      ….channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:324)

      io.netty.channel.DefaultChannelPipeline.fireChannelRead (DefaultChannelPipeline.java:847)

      ….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read (AbstractNioByteChannel.java:131)

      io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java:511)

      ….channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java:468)

      io.netty.channel.nio.NioEventLoop.processSelectedKeys (NioEventLoop.java:382)

      io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:354)

      ….netty.util.concurrent.SingleThreadEventExecutor$2.run (SingleThreadEventExecutor.java:111)

      java.lang.Thread.run (Thread.java:745)

      1. nodetool status
        xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+CMSClassUnloadingEnabled -Xms8192M -Xmx8192M -Xmn2048M -Xss256k
        Note: Ownership information does not include topology; for complete information, specify a keyspace
        Datacenter: AUS
        ===============
        Status=Up/Down
        / State=Normal/Leaving/Joining/Moving
        • Address Load Tokens Owns Host ID Rack
          UN 172.16.213.62 85.52 GB 256 11.7% 27f2fd1d-5f3c-4691-a1f6-e28c1343e212 R1
          UN 172.16.213.63 83.11 GB 256 12.2% 4869f14b-e858-46c7-967c-60bd8260a149 R1
          UN 172.16.213.64 80.91 GB 256 11.7% d4ad2495-cb24-4964-94d2-9e3f557054a4 R1
          UN 172.16.213.66 84.11 GB 256 10.3% 2a16c0dc-c36a-4196-89df-2de4f6b6cae5 R1
          UN 172.16.144.75 95.2 GB 256 11.4% f87d6518-6c8e-49d9-a013-018bbedb8414 R1
          Datacenter: TAM
          ===============
          Status=Up/Down
          / State=Normal/Leaving/Joining/Moving
        • Address Load Tokens Owns Host ID Rack
          UJ 10.14.0.155 4.38 GB 256 ? c88bebae-737b-4ade-8f79-64f655036eee R1
          UN 10.14.0.106 81.57 GB 256 10.0% 3b539927-b53a-4f50-9acd-d92fefbd84b9 R1
          UN 10.14.0.107 80.23 GB 256 10.4% b70f674d-892f-42ff-a261-5356bee79e99 R1
          UN 10.14.0.108 83.64 GB 256 11.2% 6e24b17a-0b48-46b4-8edb-b0a9206314a3 R1
          UN 10.14.0.109 91.02 GB 256 11.2% 11f02dbd-257f-4623-81f4-b94db7365775 R1

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              eyalso Eyal Sorek
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: