Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
Normal
Description
We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
on 10.209.23.110
WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
on 10.209.23.80 about the same time
ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
at java.util.HashMap$KeyIterator.next(HashMap.java:883)
at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
at java.util.HashSet.<init>(HashSet.java:100)
at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
just before that:
INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
and just after that:
INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035