Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19668

SIGSEGV originating in Paxos V2 Scheduled Task

    XMLWordPrintableJSON

Details

    Description

      I haven't gotten to the root cause of this yet. Several 4.1 nodes have crashed in in production. I'm not sure if this is related to Paxos v2 or not, but it is enabled. offheap_objects also enabled.

      I'm not sure if this affects 5.0, yet.

      Most of the crashes don't have a stacktrace - they only reference this

      Stack: [0x00007fabf4c34000,0x00007fabf4d34000],  sp=0x00007fabf4d31f00,  free space=1015k
      Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
      v  ~StubRoutines::jint_disjoint_arraycopy
      
      

      They all are in the ScheduledTasks thread.

      However, one node does have this in the crash log:

      ---------------  T H R E A D  ---------------
      
      Current thread (0x000078b375eac800):  JavaThread "ScheduledTasks:1" daemon [_thread_in_Java, id=151791, stack(0x000078b34b780000,0x000078b34b880000)]
      
      Stack: [0x000078b34b780000,0x000078b34b880000],  sp=0x000078b34b87c350,  free space=1008k
      Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
      J 29467 c2 org.apache.cassandra.db.rows.AbstractCell.clone(Lorg/apache/cassandra/utils/memory/ByteBufferCloner;)Lorg/apache/cassandra/db/rows/Cell; (50 bytes) @ 0x000078b3dd40a42f [0x000078b3dd409de0+0x000000000000064f]
      J 17669 c2 org.apache.cassandra.db.rows.Cell.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/ColumnData; (6 bytes) @ 0x000078b3dc54edc0 [0x000078b3dc54ed40+0x0000000000000080]
      J 17816 c2 org.apache.cassandra.db.rows.BTreeRow$$Lambda$845.apply(Ljava/lang/Object;)Ljava/lang/Object; (12 bytes) @ 0x000078b3dbed01a4 [0x000078b3dbed0120+0x0000000000000084]
      J 17828 c2 org.apache.cassandra.utils.btree.BTree.transform([Ljava/lang/Object;Ljava/util/function/Function;)[Ljava/lang/Object; (194 bytes) @ 0x000078b3dc5f35f0 [0x000078b3dc5f34a0+0x0000000000000150]
      J 35096 c2 org.apache.cassandra.db.rows.BTreeRow.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/Row; (37 bytes) @ 0x000078b3dda9111c [0x000078b3dda90fe0+0x000000000000013c]
      J 30500 c2 org.apache.cassandra.utils.memory.EnsureOnHeap$CloneToHeap.applyToRow(Lorg/apache/cassandra/db/rows/Row;)Lorg/apache/cassandra/db/rows/Row; (16 bytes) @ 0x000078b3dd59b91c [0x000078b3dd59b8c0+0x000000000000005c]
      J 26498 c2 org.apache.cassandra.db.transform.BaseRows.hasNext()Z (215 bytes) @ 0x000078b3dcf1c454 [0x000078b3dcf1c180+0x00000000000002d4]
      J 30775 c2 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext()Ljava/lang/Object; (49 bytes) @ 0x000078b3dc789020 [0x000078b3dc788fc0+0x0000000000000060]
      J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 0x000078b3dbb3c544 [0x000078b3dbb3c440+0x0000000000000104]
      J 35593 c2 org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Lorg/apache/cassandra/service/paxos/uncommitted/PaxosKeyState; (126 bytes) @ 0x000078b3dc7ceeec [0x000078b3dc7cee20+0x00000000000000cc]
      J 35591 c2 org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Ljava/lang/Object; (5 bytes) @ 0x000078b3dc7d09e4 [0x000078b3dc7d09a0+0x0000000000000044]
      J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 0x000078b3dbb3c544 [0x000078b3dbb3c440+0x0000000000000104]
      J 34146 c2 com.google.common.collect.Iterators.addAll(Ljava/util/Collection;Ljava/util/Iterator;)Z (41 bytes) @ 0x000078b3dd9197e8 [0x000078b3dd919680+0x0000000000000168]
      J 38256 c1 org.apache.cassandra.service.paxos.uncommitted.PaxosRows.toIterator(Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;Lorg/apache/cassandra/schema/TableId;Z)Lorg/apache/cassandra/utils/CloseableIterator; (49 bytes) @ 0x000078b3d6b677ac [0x000078b3d6b672e0+0x00000000000004cc]
      J 34823 c1 org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedIndex.repairIterator(Lorg/apache/cassandra/schema/TableId;Ljava/util/Collection;)Lorg/apache/cassandra/utils/CloseableIterator; (212 bytes) @ 0x000078b3d5675e0c [0x000078b3d5673be0+0x000000000000222c]
      J 38259 c1 org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedTracker.uncommittedKeyIterator(Lorg/apache/cassandra/schema/TableId;Ljava/util/Collection;)Lorg/apache/cassandra/utils/CloseableIterator; (116 bytes) @ 0x000078b3d6b6bc54 [0x000078b3d6b6b7e0+0x0000000000000474]
      J 38257 c1 org.apache.cassandra.service.StorageService.autoRepairPaxos(Lorg/apache/cassandra/schema/TableId;)Lorg/apache/cassandra/utils/concurrent/Future; (57 bytes) @ 0x000078b3d6b6902c [0x000078b3d6b68e00+0x000000000000022c]
      j  org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedTracker.schedulePaxosAutoRepairs()V+146
      j  org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedTracker$$Lambda$1773.run()V+4
      J 39703 c1 org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedTracker.runAndLogException(Ljava/lang/String;Ljava/lang/Runnable;)V (39 bytes) @ 0x000078b3d435adfc [0x000078b3d435ad00+0x00000000000000fc]
      j  org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedTracker.maintenance()V+19
      j  org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedTracker$$Lambda$1534.run()V+4
      J 30376 c2 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V java.base@11.0.22 (57 bytes) @ 0x000078b3dd56543c [0x000078b3dd565100+0x000000000000033c]
      J 27255% c2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base@11.0.22 (187 bytes) @ 0x000078b3dd114d58 [0x000078b3dd114ac0+0x0000000000000298]
      j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.22
      j  io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4
      j  java.lang.Thread.run()V+11 java.base@11.0.22
      v  ~StubRoutines::call_stub
      V  [libjvm.so+0x877453]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x373
      V  [libjvm.so+0x875a96]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x186
      V  [libjvm.so+0x925653]  thread_entry(JavaThread*, Thread*)+0xa3
      V  [libjvm.so+0xe41391]  JavaThread::thread_main_inner()+0x131
      V  [libjvm.so+0xe3d790]  Thread::call_run()+0x140
      V  [libjvm.so+0xbf97de]  thread_native_entry(Thread*)+0xee
      

      Attachments

        1. ci_summar-5.0.html
          183 kB
          Blake Eggleston
        2. ci_summary-4.1.html
          29.79 MB
          Blake Eggleston
        3. ci_summary-trunk.html
          148 kB
          Blake Eggleston

        Activity

          People

            rustyrazorblade Jon Haddad
            rustyrazorblade Jon Haddad
            Jon Haddad
            Blake Eggleston
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: