Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-800

Spurious Gossip Up/Down and IO Errors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.5
    • None
    • None
    • Normal

    Description

      We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.

      on 10.209.23.110

      WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
      WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
      WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
      Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
      java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
      at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
      at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
      at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
      at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

      on 10.209.23.80 about the same time

      ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
      java.util.ConcurrentModificationException
      at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
      at java.util.HashMap$KeyIterator.next(HashMap.java:883)
      at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
      at java.util.HashSet.<init>(HashSet.java:100)
      at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
      at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
      at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
      at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
      at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
      at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
      at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
      at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
      at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
      at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

      just before that:

      INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.

      and just after that:

      INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
      INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
      INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
      INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
      INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
      INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
      INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
      INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
      INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
      INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP

      We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:

      git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

      Attachments

        1. 800.txt
          0.8 kB
          Jonathan Ellis

        Activity

          People

            Unassigned Unassigned
            kingryan Ryan King
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: