Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
I've observed the following JVM crash after one of the Ignite node restarts on 2.5 (only relevant part is kept):
Stack: [0x00007f16f40b8000,0x00007f16f41b9000], sp=0x00007f16f41b7308, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x803675] J 868 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x00007f173d351ca1 [0x00007f173d351bc0+0xe1] J 3023 C1 org.apache.ignite.internal.util.GridUnsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (77 bytes) @ 0x00007f173d9e8d64 [0x00007f173d9e8ae0+0x284] J 2991 C1 org.apache.ignite.internal.pagemem.PageUtils.putBytes(JI[B)V (73 bytes) @ 0x00007f173d9e1dbc [0x00007f173d9e1d00+0xbc] j org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;ZLorg/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryEx;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+568 j org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+13 j org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(Ljava/util/List;)V+173 j org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(Z)Lorg/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture$ExchangeType;+311 j org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(Z)V+574 j org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0()V+547 j org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body()V+3 j org.apache.ignite.internal.util.worker.GridWorker.run()V+82 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub V [libjvm.so+0x695b96] V [libjvm.so+0x6960a1] V [libjvm.so+0x696537] V [libjvm.so+0x71596e] V [libjvm.so+0xa7f243] V [libjvm.so+0xa7f38c] V [libjvm.so+0x92e0f8] C [libpthread.so.0+0x76ba] start_thread+0xca
Looks like that the issue is caused by a page which ID was rotated and the node failed before checkpoint is finished. Then, on the second node restart, the page was written to the disk, but node was stopped again before the checkpoint marker was written.
Then, on second node restart we attempt to write-lock the page, but lock fails because the page tag logged to WAL is different then the one written in the store.
Attachments
Issue Links
- is duplicated by
-
IGNITE-9303 PageSnapshot can contain wrong pageId tag when not dirty page is recycling
- Resolved