Cassandra
  1. Cassandra
  2. CASSANDRA-969

Server fails to join cluster if IPv6 only

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.6.1
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Ubuntu 9.10 x64
      java: Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
      cassandra 0.6.0-rc1

      Description

      When configuring Cassandra for IPv6 connectivity on the server to server side the addition of a second node causes the both servers to loop on ArrayIndexOutOfBoundsExection for 5 minutes

      The first server has

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 65536
      at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:155)

      While the second has

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 131072
      at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:155)

      the index is double.

      These servers work find in a cluster together if they are configured IPv4

      server1 in the output is 2607:f3d0:0:2::16
      server2 is 2607:f3d0:0:1::f

        Activity

        Hide
        Cody Lerum added a comment -

        system.log files for both servers

        Show
        Cody Lerum added a comment - system.log files for both servers
        Hide
        Cody Lerum added a comment -

        wireshark capture of the networking traffic on server1

        Show
        Cody Lerum added a comment - wireshark capture of the networking traffic on server1
        Hide
        Cody Lerum added a comment -

        this file shows the initial server startup networking traffic as well.

        second server starts up at about 6 seconds in

        Show
        Cody Lerum added a comment - this file shows the initial server startup networking traffic as well. second server starts up at about 6 seconds in
        Hide
        Gary Dusbabek added a comment -

        Looks like CompactEndPointSerializationHelper is assuming a 4 byte address during deserialization, but actually writes a full 16 byte IPv6 address during serialization.

        Show
        Gary Dusbabek added a comment - Looks like CompactEndPointSerializationHelper is assuming a 4 byte address during deserialization, but actually writes a full 16 byte IPv6 address during serialization.
        Hide
        Gary Dusbabek added a comment -

        This patch should address your specific problem. What we really need to do is audit the code for this problem. There are quite a few places where we send addresses over the wire.

        Show
        Gary Dusbabek added a comment - This patch should address your specific problem. What we really need to do is audit the code for this problem. There are quite a few places where we send addresses over the wire.
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Hide
        Cody Lerum added a comment -

        I'll try and recompile later today and test.

        Show
        Cody Lerum added a comment - I'll try and recompile later today and test.
        Hide
        Cody Lerum added a comment -

        I checked out the .6 branch and built off that.

        Still getting errors

        ERROR [MESSAGE-DESERIALIZER-POOL:1] 2010-04-12 15:37:57,020 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
        java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 65536
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
        Caused by: java.lang.ArrayIndexOutOfBoundsException: 65536
        at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:155)
        at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:113)
        at org.apache.cassandra.net.MessageSerializer.deserialize(Message.java:136)
        at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:45)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more

        Show
        Cody Lerum added a comment - I checked out the .6 branch and built off that. Still getting errors ERROR [MESSAGE-DESERIALIZER-POOL:1] 2010-04-12 15:37:57,020 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 65536 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.ArrayIndexOutOfBoundsException: 65536 at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:155) at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:113) at org.apache.cassandra.net.MessageSerializer.deserialize(Message.java:136) at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:45) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more
        Hide
        Cody Lerum added a comment -

        actually I take that back

        rw-rr- 1 root root 1275022 2010-03-28 09:25 apache-cassandra-0.6.0-rc1.jar

        The lib in my new build is still old.

        Show
        Cody Lerum added a comment - actually I take that back rw-r r - 1 root root 1275022 2010-03-28 09:25 apache-cassandra-0.6.0-rc1.jar The lib in my new build is still old.
        Hide
        Cody Lerum added a comment -

        ok got it built..

        The joining server shows

        ERROR 16:17:52,106 Error in executor futuretask
        java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
        Caused by: java.lang.RuntimeException: java.io.EOFException
        at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:49)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more
        Caused by: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readUTF(DataInputStream.java:592)
        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
        at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:140)
        at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:113)
        at org.apache.cassandra.net.MessageSerializer.deserialize(Message.java:136)
        at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:45)
        ... 6 more

        and existing

        ERROR 16:17:42,991 Error in executor futuretask
        java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.net.UnknownHostException: addr is of illegal length
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
        Caused by: java.lang.RuntimeException: java.net.UnknownHostException: addr is of illegal length
        at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:49)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more
        Caused by: java.net.UnknownHostException: addr is of illegal length
        at java.net.InetAddress.getByAddress(InetAddress.java:935)
        at java.net.InetAddress.getByAddress(InetAddress.java:1311)
        at org.apache.cassandra.net.CompactEndPointSerializationHelper.deserialize(CompactEndPointSerializationHelper.java:37)
        at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:139)
        at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:113)
        at org.apache.cassandra.net.MessageSerializer.deserialize(Message.java:136)
        at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:45)

        Show
        Cody Lerum added a comment - ok got it built.. The joining server shows ERROR 16:17:52,106 Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:49) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readUTF(DataInputStream.java:592) at java.io.DataInputStream.readUTF(DataInputStream.java:547) at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:140) at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:113) at org.apache.cassandra.net.MessageSerializer.deserialize(Message.java:136) at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:45) ... 6 more and existing ERROR 16:17:42,991 Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.net.UnknownHostException: addr is of illegal length at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.net.UnknownHostException: addr is of illegal length at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:49) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more Caused by: java.net.UnknownHostException: addr is of illegal length at java.net.InetAddress.getByAddress(InetAddress.java:935) at java.net.InetAddress.getByAddress(InetAddress.java:1311) at org.apache.cassandra.net.CompactEndPointSerializationHelper.deserialize(CompactEndPointSerializationHelper.java:37) at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:139) at org.apache.cassandra.net.HeaderSerializer.deserialize(Header.java:113) at org.apache.cassandra.net.MessageSerializer.deserialize(Message.java:136) at org.apache.cassandra.net.MessageDeserializationTask.run(MessageDeserializationTask.java:45)
        Hide
        Gary Dusbabek added a comment -

        As I suspected: we have more code that is not IPv6 friendly.

        Cody, can you give me the low-down on how I can turn off IPv4 and test in an environment that is reasonably similar to yours? Assume I have an ubuntu VM at my disposal.

        Show
        Gary Dusbabek added a comment - As I suspected: we have more code that is not IPv6 friendly. Cody, can you give me the low-down on how I can turn off IPv4 and test in an environment that is reasonably similar to yours? Assume I have an ubuntu VM at my disposal.
        Hide
        Cody Lerum added a comment -

        Gary,

        I'm running on ubuntu with dual stack (both v4 and v6) I am merely only using ipv6 addresses in the storage and seed portions of the storage-conf.xml.

        in your simply set

        /etc/network/interfaces

        iface eth0 inet6 static
        address 2607:f3d0:0:2::A
        netmask 64
        gateway 2607:f3d0:0:2::1

        and on the other server

        iface eth0 inet6 static
        address 2607:f3d0:0:2::B
        netmask 64
        gateway 2607:f3d0:0:2::1

        Then in your storage-conf.xml

        <Seeds>
        <Seed>2607:f3d0:0:1::b</Seed>
        </Seeds>
        <ListenAddress>2607:f3d0:0:2::a</ListenAddress>
        <!-- internal communications port -->
        <StoragePort>7000</StoragePort>

        As long as both the vm's are on the same network (non-routed) you should be able to test just fine.

        Show
        Cody Lerum added a comment - Gary, I'm running on ubuntu with dual stack (both v4 and v6) I am merely only using ipv6 addresses in the storage and seed portions of the storage-conf.xml. in your simply set /etc/network/interfaces iface eth0 inet6 static address 2607:f3d0:0:2::A netmask 64 gateway 2607:f3d0:0:2::1 and on the other server iface eth0 inet6 static address 2607:f3d0:0:2::B netmask 64 gateway 2607:f3d0:0:2::1 Then in your storage-conf.xml <Seeds> <Seed>2607:f3d0:0:1::b</Seed> </Seeds> <ListenAddress>2607:f3d0:0:2::a</ListenAddress> <!-- internal communications port --> <StoragePort>7000</StoragePort> As long as both the vm's are on the same network (non-routed) you should be able to test just fine.
        Hide
        Gary Dusbabek added a comment - - edited

        I am not able to reproduce the latest problem. I went as far as creating a unit test to test CompactEndPointSerializationHelper for all manner of IPv4 and IPv6 addresses. It seems to do the job.

        The EOF in the new stack trace makes me think that one of the nodes might not be up on the same code. Cody, can you verify?

        Show
        Gary Dusbabek added a comment - - edited I am not able to reproduce the latest problem. I went as far as creating a unit test to test CompactEndPointSerializationHelper for all manner of IPv4 and IPv6 addresses. It seems to do the job. The EOF in the new stack trace makes me think that one of the nodes might not be up on the same code. Cody, can you verify?
        Hide
        Cody Lerum added a comment -

        I may have screwed up the build. I will try the latest from Hudson

        Show
        Cody Lerum added a comment - I may have screwed up the build. I will try the latest from Hudson
        Hide
        Cody Lerum added a comment -

        Gary, I tested off http://hudson.zones.apache.org/hudson/job/Cassandra/405/ and it all looks good.

        root@cassandra:/opt/cassandra# bin/cassandra -f
        INFO 22:32:35,577 Auto DiskAccessMode determined to be mmap
        WARN 22:32:35,840 Couldn't detect any schema definitions in local storage. I hope you've got a plan.
        INFO 22:32:35,851 Replaying /var/lib/cassandra/commitlog/CommitLog-1271219308493.log
        INFO 22:32:35,899 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1271219555899.log
        INFO 22:32:35,910 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1271219555899.log', position=163)
        INFO 22:32:35,911 Enqueuing flush of Memtable(LocationInfo)@1971599989
        INFO 22:32:35,913 Writing Memtable(LocationInfo)@1971599989
        INFO 22:32:36,012 Completed flushing /var/lib/cassandra/data/system/LocationInfo-b-1-Data.db
        INFO 22:32:36,025 Log replay complete
        INFO 22:32:36,051 Saved Token found: 149994310325493222650165912864788358013
        INFO 22:32:36,052 Saved ClusterName found: CLEARFLY-1
        INFO 22:32:36,062 Starting up server gossip
        INFO 22:32:36,110 Binding thrift service to /2607:f3d0:0:2:0:0:0:16:9160
        INFO 22:32:36,115 Cassandra starting up...
        INFO 22:33:35,564 Node /2607:f3d0:0:1:0:0:0:f is now part of the cluster
        INFO 22:33:36,553 InetAddress /2607:f3d0:0:1:0:0:0:f is now UP

        You can close this out.

        Show
        Cody Lerum added a comment - Gary, I tested off http://hudson.zones.apache.org/hudson/job/Cassandra/405/ and it all looks good. root@cassandra:/opt/cassandra# bin/cassandra -f INFO 22:32:35,577 Auto DiskAccessMode determined to be mmap WARN 22:32:35,840 Couldn't detect any schema definitions in local storage. I hope you've got a plan. INFO 22:32:35,851 Replaying /var/lib/cassandra/commitlog/CommitLog-1271219308493.log INFO 22:32:35,899 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1271219555899.log INFO 22:32:35,910 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1271219555899.log', position=163) INFO 22:32:35,911 Enqueuing flush of Memtable(LocationInfo)@1971599989 INFO 22:32:35,913 Writing Memtable(LocationInfo)@1971599989 INFO 22:32:36,012 Completed flushing /var/lib/cassandra/data/system/LocationInfo-b-1-Data.db INFO 22:32:36,025 Log replay complete INFO 22:32:36,051 Saved Token found: 149994310325493222650165912864788358013 INFO 22:32:36,052 Saved ClusterName found: CLEARFLY-1 INFO 22:32:36,062 Starting up server gossip INFO 22:32:36,110 Binding thrift service to /2607:f3d0:0:2:0:0:0:16:9160 INFO 22:32:36,115 Cassandra starting up... INFO 22:33:35,564 Node /2607:f3d0:0:1:0:0:0:f is now part of the cluster INFO 22:33:36,553 InetAddress /2607:f3d0:0:1:0:0:0:f is now UP You can close this out.

          People

          • Assignee:
            Gary Dusbabek
            Reporter:
            Cody Lerum
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development