Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14362

Bind to correct local address in 4.0 streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 4.0-alpha1, 4.0
    • None
    • None
    • Normal

    Description

      I'm not sure if anybody has tried using trunk and starting an actual EC2 cluster, but with a 3 node setup that works on 3.11, it fails to work on trunk. I mistakenly stumbled into it while testing my own code changes.

      My setup is as follows:

      broadcast_rpc_address: publicIP
      broadcast_address: publicIP
      listen_address: omitted. Ends up as privateIP.

      Works on 3.11 just fine.

      On trunk though, it never works. My node is never able to join the cluster:

      Apr 03 05:57:06 ip-10-0-47-122 cassandra[13914]: INFO  [main] 2018-04-03 05:57:05,895 RangeStreamer.java:195 - Bootstrap: range (-128373781239966537,-122439194129870521] exists on 52.88.241.181:7000 for keyspace system_traces
      Apr 03 05:57:06 ip-10-0-47-122 cassandra[13914]: INFO  [main] 2018-04-03 05:57:05,895 RangeStreamer.java:195 - Bootstrap: range (6968670424536541270,6973888347502882935] exists on 52.88.241.181:7000 for keyspace system_traces
      Apr 03 05:57:42 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 2018-04-03 05:57:42,298 FailureDetector.java:324 - Not marking nodes down due to local pause of 26215173446ns > 5000000000ns
      Apr 03 05:57:53 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 2018-04-03 05:57:53,035 FailureDetector.java:324 - Not marking nodes down due to local pause of 10736485907ns > 5000000000ns
      Apr 03 05:58:30 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 2018-04-03 05:58:30,790 Gossiper.java:814 - Gossip stage has 28 pending tasks; skipping status check (no nodes will be marked down)
      Apr 03 05:58:33 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 2018-04-03 05:58:33,060 Gossiper.java:814 - Gossip stage has 20 pending tasks; skipping status check (no nodes will be marked down)
      Apr 03 06:04:33 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 2018-04-03 06:04:33,826 FailureDetector.java:324 - Not marking nodes down due to local pause of 400790432954ns > 5000000000ns
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: WARN  [GossipTasks:1] 2018-04-03 06:04:49,133 Gossiper.java:814 - Gossip stage has 2 pending tasks; skipping status check (no nodes will be marked down)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ERROR [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,138 StreamSession.java:524 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] Streaming error occurred on session with peer 52.88.241.181:7000
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: java.io.IOException: failed to connect to 52.88.241.181:7000 (STREAM) for streaming data
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:98)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:57)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.createChannel(NettyStreamingMessageSender.java:183)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.setupControlMessageChannel(NettyStreamingMessageSender.java:165)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:222)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.initialize(NettyStreamingMessageSender.java:146)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:271)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:273)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at java.lang.Thread.run(Thread.java:748)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Cannot assign requested address
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.unix.Errors.newIOException(Errors.java:117)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.unix.Socket.bind(Socket.java:266)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.epoll.AbstractEpollStreamChannel.doConnect(AbstractEpollStreamChannel.java:729)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.epoll.EpollSocketChannel.doConnect(EpollSocketChannel.java:184)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.connect(AbstractEpollStreamChannel.java:797)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1274)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:545)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:530)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:999)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:260)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:254)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:312)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         ... 1 common frames omitted
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: INFO  [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,140 StreamResultFuture.java:197 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] Session with 52.88.241.181:7000 is complete
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: ERROR [StreamConnectionEstablisher:1] 2018-04-03 06:04:49,146 StreamSession.java:524 - [Stream #d4cd6420-3703-11e8-a6a5-e51ddc10cfe6] Streaming error occurred on session with peer 52.88.241.181:7000
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]: java.lang.RuntimeException: stream has been closed, cannot send Prepare SYN (3 requests,  0 files}
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:209)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.StreamSession.onInitializationComplete(StreamSession.java:495)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:272)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:273)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
      Apr 03 06:04:49 ip-10-0-47-122 cassandra[13914]:         at java.lang.Thread.run(Thread.java:748)
      

      Remote debugging it, it eventually seems to come to be due to trying to call socket.bind in doConnect in AbstractEpollStreamChannel. Local address is <publicIP>:0. I think port 0 is fine because it just means pick the next ephemeral port available, but there's no way of binding to the public IP.

      I made a commit https://github.com/apache/cassandra/commit/00222ddf0694f1c35c2bdd84fd7407174c3fc57a that seems to have fixed it so I can continue on with my purposes, and that is have StreamSession use listen address instead of broadcast address which may be public. In the event listen address is public I guess that means if it was able to bind to that interface to begin with then socket.bind should also work.

      Not sure if this is the right way though, and it seems to have been introduced by the Streaming rework to netty. https://issues.apache.org/jira/browse/CASSANDRA-12229.

      Attachments

        Activity

          People

            Lerh Low Lerh Chuan Low
            Lerh Low Lerh Chuan Low
            Lerh Chuan Low
            Jason Brown
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: