Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.2.0
Description
After installSnapshot, the bootstrapped SCM crashed in a short time while there is on-going write workload.
Clues from the core dump file, the new added SCM crashed in thread StateMachineUpdater, while accessing RocksDB.
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700 # # JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86) # Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing linux-amd64 compressed oops) # Problematic frame: # C [librocksdbjni7209090472417999125.so+0x242c0f] rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf # # Core dump written. Default location: /root/core or core.1406 # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. #--------------- T H R E A D ---------------Current thread (0x00007fcf3ded2800): JavaThread "7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater" daemon [_thread_in_native, id=1559, stack(0x00007fceecacb000,0x00007fceecbcc000)]siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
Stack: [0x00007fceecacb000,0x00007fceecbcc000], sp=0x00007fceecbc96a0, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [librocksdbjni7209090472417999125.so+0x242c0f] rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf C [librocksdbjni7209090472417999125.so+0x242ea2] Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62 j org.rocksdb.RocksDB.get(J[BIIJ)[B+0 j org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13 j org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9 j org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5 j org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14 j org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61 j org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5 j org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8 J 3481 C1 java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object; (493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884] j org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11 v ~StubRoutines::call_stub V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1048 V [libjvm.so+0x9a9b49] Reflection::invoke(instanceKlassHandle, methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*)+0x599 V [libjvm.so+0x9ad7ed] Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, Thread*)+0x14d V [libjvm.so+0x725a66] JVM_InvokeMethod+0x1e6 J 2759 sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd] J 2758 C1 sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4] J 5190 C2 sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48] j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56 j org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68 j org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27 j org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126 j org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142 j org.apache.ratis.server.impl.StateMachineUpdater.run()V+29 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1048 V [libjvm.so+0x684127] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7 V [libjvm.so+0x684660] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x60 V [libjvm.so+0x71c121] thread_entry(JavaThread*, Thread*)+0x91 V [libjvm.so+0xa8c671] JavaThread::thread_main_inner()+0xf1 V [libjvm.so+0x938f12] java_start(Thread*)+0x132 C [libpthread.so.0+0x7eb5] start_thread+0xc5
The root cause is missing reinitialize() in SequenceIdGenerator, thereby after installing snapshot, SequenceIdGenerator holds a dangling reference to the old removed RocksDB.
Attachments
Issue Links
- fixes
-
HDDS-6732 Follower SCM crashed during snapshot installation
- Resolved
- links to