Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823 SCM HA Support
  3. HDDS-5281

Add reinitialize() for SequenceIdGenerator.

    XMLWordPrintableJSON

Details

    Description

      After installSnapshot, the bootstrapped SCM crashed in a short time while there is on-going write workload.

       

      Clues from the core dump file, the new added SCM crashed in thread StateMachineUpdater, while accessing RocksDB. 

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700
      #
      # JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86)
      # Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing linux-amd64 compressed oops)
      # Problematic frame:
      # C  [librocksdbjni7209090472417999125.so+0x242c0f]  rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
      #
      # Core dump written. Default location: /root/core or core.1406
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      # The crash happened outside the Java Virtual Machine in native code.
      # See problematic frame for where to report the bug.
      #---------------  T H R E A D  ---------------Current thread (0x00007fcf3ded2800):  JavaThread "7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater" daemon [_thread_in_native, id=1559, stack(0x00007fceecacb000,0x00007fceecbcc000)]siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
      

       

      Stack: [0x00007fceecacb000,0x00007fceecbcc000],  sp=0x00007fceecbc96a0,  free space=1017k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      C  [librocksdbjni7209090472417999125.so+0x242c0f]  rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
      C  [librocksdbjni7209090472417999125.so+0x242ea2]  Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62
      j  org.rocksdb.RocksDB.get(J[BIIJ)[B+0
      j  org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13
      j  org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9
      j  org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5
      j  org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14
      j  org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61
      j  org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5
      j  org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
      J 3481 C1 java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object; (493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884]
      j  org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11
      v  ~StubRoutines::call_stub
      V  [libjvm.so+0x682be8]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1048
      V  [libjvm.so+0x9a9b49]  Reflection::invoke(instanceKlassHandle, methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*)+0x599
      V  [libjvm.so+0x9ad7ed]  Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, Thread*)+0x14d
      V  [libjvm.so+0x725a66]  JVM_InvokeMethod+0x1e6
      J 2759  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd]
      J 2758 C1 sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4]
      J 5190 C2 sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48]
      j  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
      j  org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68
      j  org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27
      j  org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126
      j  org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142
      j  org.apache.ratis.server.impl.StateMachineUpdater.run()V+29
      j  java.lang.Thread.run()V+11
      v  ~StubRoutines::call_stub
      V  [libjvm.so+0x682be8]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1048
      V  [libjvm.so+0x684127]  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7
      V  [libjvm.so+0x684660]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x60
      V  [libjvm.so+0x71c121]  thread_entry(JavaThread*, Thread*)+0x91
      V  [libjvm.so+0xa8c671]  JavaThread::thread_main_inner()+0xf1
      V  [libjvm.so+0x938f12]  java_start(Thread*)+0x132
      C  [libpthread.so.0+0x7eb5]  start_thread+0xc5
      

       

      The root cause is missing reinitialize() in SequenceIdGenerator, thereby after installing snapshot, SequenceIdGenerator holds a dangling reference to the old removed RocksDB.

       

      Attachments

        Issue Links

          Activity

            People

              glengeng Glen Geng
              glengeng Glen Geng
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: