Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
Cassandra 2.1.13, Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a container, 4 core server with 16GB memory.
-
Normal
Description
Initially saw the following exception numerous times:
WARN [SharedPool-Worker-8] 2017-05-09 23:04:00,018 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-8,5,main]: {} java.lang.NullPointerException: null at java.lang.Double.compareTo(Double.java:49) ~[na:1.8.0_101] at java.util.concurrent.ConcurrentSkipListMap.cpr(ConcurrentSkipListMap.java:655) ~[na:1.8.0_101] at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:835) ~[na:1.8.0_101] at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1962) ~[na:1.8.0_101] at com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:104) ~[metrics-core-2.2.0.jar:na] at com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81) ~[metrics-core-2.2.0.jar:na] at com.yammer.metrics.core.Histogram.update(Histogram.java:110) ~[metrics-core-2.2.0.jar:na] at com.yammer.metrics.core.Timer.update(Timer.java:198) ~[metrics-core-2.2.0.jar:na] at com.yammer.metrics.core.Timer.update(Timer.java:76) ~[metrics-core-2.2.0.jar:na] at org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:108) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.metrics.LatencyMetrics.addNano(LatencyMetrics.java:114) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1863) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:53) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [cassandra-all-2.1.13.1218.jar:2.1.13.1218] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
Then this lead to a high rate of these warnings:
WARN [SharedPool-Worker-91] 2017-05-09 23:04:14,682 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-91,5,main]: {} java.lang.ClassCastException: null WARN [SharedPool-Worker-92] 2017-05-09 23:04:14,704 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-92,5,main]: {} java.lang.RuntimeException: java.lang.ClassCastException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2244) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [cassandra-all-2.1.13.1218.jar:2.1.13.1218] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
The same errors continued until the last error reported:
WARN [SharedPool-Worker-161] 2017-05-09 23:06:18,617 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-161,5,main]: {} java.lang.RuntimeException: java.lang.ClassCastException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2244) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [cassandra-all-2.1.13.1218.jar:2.1.13.1218] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] Caused by: java.lang.ClassCastException: null
At which point the JVM crashed completely and exited. Looking error.log this is an extract:
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fe80626a955, pid=79, tid=0x00007fe80433d700 # # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5c3955] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x45 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x00007fe800035800): GCTaskThread [stack: 0x00007fe80423d000,0x00007fe80433e000] [id=256] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000030b252310 Registers: RAX=0x00007fe806c34b90, RBX=0x00007fe806c34b80, RCX=0x0000000000000003, RDX=0x0000000000000001 RSP=0x00007fe80433c2d0, RBP=0x00007fe80433c350, RSI=0x0000000000000001, RDI=0x000000030b252308 R8 =0x00007fe80002e8f0, R9 =0x00000000f7b096ba, R10=0x00000006f1f78f50, R11=0x00007fe80433c5a0 R12=0x00000005cd63de64, R13=0x00000007bd84b5d0, R14=0x00007fe80433c5a0, R15=0x000000000000378f RIP=0x00007fe80626a955, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004 TRAPNO=0x000000000000000e Top of Stack: (sp=0x00007fe80433c2d0) 0x00007fe80433c2d0: 00000006f1f78f50 746e756f63a5826c 0x00007fe80433c2e0: 01007fe800030c80 0000000100000000 0x00007fe80433c2f0: 00007fe80433c320 0000000c06639e22 0x00007fe80433c300: 00007fe800030c80 00007fe800030c80 0x00007fe80433c310: 00007fe80433c5b0 01007fe7e4a7c1f0 0x00007fe80433c320: 00007fe80433c350 00007fe806c34b80 0x00007fe80433c330: 00000005cd63de64 00007fe8000ba970 0x00007fe80433c340: 00007fe80433c5a0 000000000000378f 0x00007fe80433c350: 00007fe80433c430 00007fe80626b50b 0x00007fe80433c360: 00007fe80433c3f0 00007fe80433caa0 0x00007fe80433c370: 00007fe80433c390 00007fe80433c3d0 0x00007fe80433c380: 00007fe80433c3c0 00007fe80433c3b0 0x00007fe80433c390: 00007fe80433c3e0 00007fe80433c5b0 0x00007fe80433c3a0: 00007fe80433c710 00007fe80433c3f0 0x00007fe80433c3b0: 00007fe806c0c120 00007fe800030d50 0x00007fe80433c3c0: 0000000727504742 0000000000000000 0x00007fe80433c3d0: 0000000000000000 0000000000000800 0x00007fe80433c3e0: 00007fe7c819b400 00007fe80433c4b0 0x00007fe80433c3f0: 00000005cd63de65 00007fe80433c4b0 0x00007fe80433c400: 00007fe80433ca00 00007fe80433caa0 0x00007fe80433c410: 0000000000000000 00007fe80433c8d0 0x00007fe80433c420: 00007fe80433c5a0 00007fe80433ca00 0x00007fe80433c430: 00007fe80433c500 00007fe806245d17 0x00007fe80433c440: 00007fe80433c460 00007fe80625fb18 0x00007fe80433c450: 00007fe80433ca00 0000000000000000 0x00007fe80433c460: 00007fe80433c500 00007fe806271049 0x00007fe80433c470: 00007fe806c001d0 00007fe80433cb20 0x00007fe80433c480: 00007fe806c001f0 00000000043c9800 0x00007fe80433c490: 00007fe80002e8f0 00007fe80433ca00 0x00007fe80433c4a0: 00007fe7f203ea50 00007fe80433caa0 0x00007fe80433c4b0: 0000000000000000 00007fe80433c8d0 0x00007fe80433c4c0: 00007fe7d0bdf580 00007fe80433ca00 Instructions: (pc=0x00007fe80626a955) 0x00007fe80626a935: 88 0f b6 10 84 d2 0f 84 3f 01 00 00 48 8b 05 40 0x00007fe80626a945: b1 9a 00 41 8b 7d 08 8b 48 08 48 d3 e7 48 03 38 0x00007fe80626a955: 8b 77 08 83 fe 00 0f 8e 2f 01 00 00 40 f6 c6 01 0x00007fe80626a965: 0f 85 35 01 00 00 89 f0 c1 f8 03 4c 63 f8 49 8b Register to memory mapping: RAX=0x00007fe806c34b90: <offset 0xf8db90> in /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so at 0x00007fe805ca7000 RBX=0x00007fe806c34b80: <offset 0xf8db80> in /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so at 0x00007fe805ca7000 RCX=0x0000000000000003 is an unknown value RDX=0x0000000000000001 is an unknown value RSP=0x00007fe80433c2d0 is an unknown value RBP=0x00007fe80433c350 is an unknown value RSI=0x0000000000000001 is an unknown value RDI=0x000000030b252308 is an unknown value R8 =0x00007fe80002e8f0 is an unknown value R9 =0x00000000f7b096ba is an unknown value R10=0x00000006f1f78f50 is an oop java.util.concurrent.ConcurrentSkipListMap$Node - klass: 'java/util/concurrent/ConcurrentSkipListMap$Node' R11=0x00007fe80433c5a0 is an unknown value R12=0x00000005cd63de64 is pointing into object: 0x00000005cd63de50 java.util.concurrent.ConcurrentSkipListMap$Node - klass: 'java/util/concurrent/ConcurrentSkipListMap$Node' R13=0x00000007bd84b5d0 is pointing into object: 0x00000007bd83c1e8 [B - klass: {type array byte} - length: 65536 R14=0x00007fe80433c5a0 is an unknown value R15=0x000000000000378f is an unknown value Stack: [0x00007fe80423d000,0x00007fe80433e000], sp=0x00007fe80433c2d0, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x5c3955] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x45 V [libjvm.so+0x5c450b] G1ParScanThreadState::trim_queue()+0x4ab V [libjvm.so+0x59ed17] G1ParEvacuateFollowersClosure::do_void()+0x27 V [libjvm.so+0x5aa923] G1ParTask::work(unsigned int)+0x463 V [libjvm.so+0xae5a6f] GangWorker::loop()+0xcf V [libjvm.so+0x924698] java_start(Thread*)+0x108 --------------- P R O C E S S --------------- Java Threads: ( => current thread ) 0x00007fe543263000 JavaThread "MemtablePostFlush:36560" daemon [_thread_blocked, id=9747, stack(0x00007fe7070c5000,0x00007fe707106000)] 0x00007fe540c31800 JavaThread "ReadRepairStage:33062" daemon [_thread_blocked, id=9746, stack(0x00007fe7c8286000,0x00007fe7c82c7000)] 0x00007fe7c4ced800 JavaThread "CompactionExecutor:16930" daemon [_thread_blocked, id=9743, stack(0x00007fe7c89e8000,0x00007fe7c8a29000)] 0x00007fe802ffa000 JavaThread "CompactionExecutor:16929" daemon [_thread_blocked, id=9742, stack(0x00007fe7c3005000,0x00007fe7c3046000)] 0x00007fe4f4f09800 JavaThread "ReadRepairStage:33061" daemon [_thread_blocked, id=9739, stack(0x00007fe52eab6000,0x00007fe52eaf7000)] 0x00007fe64a846000 JavaThread "ReadRepairStage:33059" daemon [_thread_blocked, id=9737, stack(0x00007fe7c2f83000,0x00007fe7c2fc4000)] 0x00007fe64a823000 JavaThread "ReadRepairStage:33057" daemon [_thread_blocked, id=9735, stack(0x00007fe7c8245000,0x00007fe7c8286000)] 0x00007fe540cac000 JavaThread "StreamConnectionEstablisher:4" daemon [_thread_blocked, id=31748, stack(0x00007fe7c325d000,0x00007fe7c329e000)] 0x00007fe521bb4000 JavaThread "StreamingTransferTaskTimeouts:1" daemon [_thread_blocked, id=31693, stack(0x00007fe7c81ad000,0x00007fe7c81ee000)] 0x00007fe50e479000 JavaThread "StreamConnectionEstablisher:3" daemon [_thread_blocked, id=31688, stack(0x00007fe7c9926000,0x00007fe7c9967000)] 0x00007fe4f729c000 JavaThread "StreamConnectionEstablisher:2" daemon [_thread_blocked, id=31686, stack(0x00007fe7c9dee000,0x00007fe7c9e2f000)] 0x00007fe4f5087800 JavaThread "StreamConnectionEstablisher:1" daemon [_thread_blocked, id=31681, stack(0x00007fe707f50000,0x00007fe707f91000)] 0x00007fe7151a1000 JavaThread "MessagingService-Incoming-/52.221.228.170" [_thread_in_native, id=16239, stack(0x00007fe707f91000,0x00007fe707fd2000)] 0x00007fe716a41800 JavaThread "MessagingService-Incoming-/54.169.103.14" [_thread_in_native, id=16235, stack(0x00007fe7c1f8f000,0x00007fe7c1fd0000)] 0x00007fe50e38b800 JavaThread "MessagingService-Incoming-/54.179.183.26" [_thread_in_native, id=16231, stack(0x00007fe7c8055000,0x00007fe7c8096000)] 0x00007fe7ecaf5000 JavaThread "MessagingService-Incoming-/52.221.228.170" [_thread_blocked, id=16230, stack(0x00007fe7bfaf0000,0x00007fe7bfb31000)] 0x00007fe50fc71000 JavaThread "MessagingService-Incoming-/54.169.103.14" [_thread_blocked, id=16229, stack(0x00007fe7c2b39000,0x00007fe7c2b7a000)] 0x00007fe71748a800 JavaThread "MessagingService-Incoming-/52.221.217.27" [_thread_in_native, id=16226, stack(0x00007fe7c846d000,0x00007fe7c84ae000)] 0x00007fe50f5ff000 JavaThread "MessagingService-Incoming-/52.221.217.27" [_thread_blocked, id=16223, stack(0x00007fe53fbf0000,0x00007fe53fc31000)] 0x00007fe71732f000 JavaThread "MessagingService-Incoming-/54.179.183.26" [_thread_blocked, id=16222, stack(0x00007fe707188000,0x00007fe7071c9000)] 0x00007fe521ea9000 JavaThread "SharedPool-Worker-1641" daemon [_thread_blocked, id=7064, stack(0x00007fe7c84ef000,0x00007fe7c8530000)] 0x00007fe50dee4800 JavaThread "SharedPool-Worker-1638" daemon [_thread_blocked, id=7063, stack(0x00007fe7c8e04000,0x00007fe7c8e45000)] 0x00007fe521fda800 JavaThread "SharedPool-Worker-1637" daemon [_thread_blocked, id=7062, stack(0x00007fe7c90d3000,0x00007fe7c9114000)] 0x00007fe52090d800 JavaThread "SharedPool-Worker-1639" daemon [_thread_blocked, id=7061, stack(0x00007fe7c9167000,0x00007fe7c91a8000)] 0x00007fe542cc3800 JavaThread "SharedPool-Worker-1640" daemon [_thread_blocked, id=7060, stack(0x00007fe7c9386000,0x00007fe7c93c7000)] 0x00007fe543ea2000 JavaThread "SharedPool-Worker-1621" daemon [_thread_blocked, id=7059, stack(0x00007fe7c93c7000,0x00007fe7c9408000)] 0x00007fe52337e000 JavaThread "SharedPool-Worker-1623" daemon [_thread_blocked, id=7058, stack(0x00007fe7c9408000,0x00007fe7c9449000)] 0x00007fe50de15000 JavaThread "SharedPool-Worker-1625" daemon [_thread_blocked, id=7057, stack(0x00007fe7c9449000,0x00007fe7c948a000)] 0x00007fe52290d000 JavaThread "SharedPool-Worker-1627" daemon [_thread_blocked, id=7056, stack(0x00007fe7c94bb000,0x00007fe7c94fc000)] 0x00007fe5216e5800 JavaThread "SharedPool-Worker-1629" daemon [_thread_blocked, id=7055, stack(0x00007fe7c9724000,0x00007fe7c9765000)] 0x00007fe5208df000 JavaThread "SharedPool-Worker-1631" daemon [_thread_blocked, id=7054, stack(0x00007fe7c9765000,0x00007fe7c97a6000)] 0x00007fe714b44000 JavaThread "SharedPool-Worker-1633" daemon [_thread_blocked, id=7053, stack(0x00007fe7c97a6000,0x00007fe7c97e7000)] 0x00007fe521230000 JavaThread "SharedPool-Worker-1635" daemon [_thread_blocked, id=7052, stack(0x00007fe7c99df000,0x00007fe7c9a20000)] 0x00007fe542898000 JavaThread "SharedPool-Worker-1619" daemon [_thread_blocked, id=7051, stack(0x00007fe7c9a20000,0x00007fe7c9a61000)] 0x00007fe714fa3000 JavaThread "SharedPool-Worker-1636" daemon [_thread_blocked, id=7050, stack(0x00007fe7c9ac8000,0x00007fe7c9b09000)] 0x00007fe5213f7000 JavaThread "SharedPool-Worker-1634" daemon [_thread_blocked, id=7049, stack(0x00007fe7c9b09000,0x00007fe7c9b4a000)] 0x00007fe7edc55000 JavaThread "SharedPool-Worker-1632" daemon [_thread_blocked, id=7048, stack(0x00007fe7c9b4a000,0x00007fe7c9b8b000)] 0x00007fe50ccb3000 JavaThread "SharedPool-Worker-1630" daemon [_thread_blocked, id=7047, stack(0x00007fe7c9b8b000,0x00007fe7c9bcc000)] 0x00007fe50c641800 JavaThread "SharedPool-Worker-1628" daemon [_thread_blocked, id=7046, stack(0x00007fe7c9cac000,0x00007fe7c9ced000)] ...
Complete JVM crash report at https://dl.dropboxusercontent.com/u/1575409/Ably/logs/2017-05-10-cassandra-crash/us-west-1/error.log
I also have the entire log from Cassandra at the time if useful, although looking at it there was nothing logged for a few minutes before this happened so no clear indication what triggered it.
There were no CPU, load, memory issues at the time (the crash occurred at 2017-05-09 23:06)