Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-15227

Improve diagnostic capabilities of persistence corruptions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.12
    • persistence
    • None
    • Release Notes Required

    Description

      There are some diagnostic problems:

      • assertions inside of PagesList can lead to CorruptedTreeException, which makes no sense. Example: 
        2020-11-30 20:17:27.170[ERROR]sys-stripe-29-#30%DPL_GRID%DplGridNodeName%[org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924, val2=72372732968376779]], groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey, msg=Runtime failure on search row: SearchRow [key=KeyCacheObject [hasValBytes=true], hash=513719283, cacheId=-295471981]]]]
        2org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924, val2=72372732968376779]], groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey, msg=Runtime failure on search row: SearchRow [key=KeyCacheObject [hasValBytes=true], hash=513719283, cacheId=-295471981]]
        3at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6117)
        4at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1937)
        5at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1670)
        6at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1653)
        7at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2519)
        8at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
        9at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4312)
        10at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4289)
        11at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1555)
        12at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:756)
        13at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794)
        14at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605)
        15at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477)
        16at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534)
        17at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1092)
        18at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:968)
        19at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:923)
        20at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:132)
        21at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:229)
        22at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:227)
        23at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
        24at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
        25at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
        26at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        27at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        28at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        29at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1722)
        30at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1329)
        31at org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:158)
        32at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1214)
        33at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
        34at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
        35at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
        36at java.lang.Thread.run(Thread.java:748)
        37Caused by: java.lang.AssertionError: Incorrectly recycled pageId in reuse bucket: ff011e9e000012f7
        38at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.takeEmptyPage(PagesList.java:1358)
        39at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:517)
        40at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:74)
        41at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:35)
        42at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:112)
        43at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1720)
        44at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2494)
        45at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5876)
        46at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5813)
        47at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4000)
        48at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3894)
        49at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2020)
        50at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
        51at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
        52at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
        
      • corruptions of partition meta also lead to mismatching exception type in pages list, e.g.:
        2021-01-29 05:48:41.644[ERROR][db-checkpoint-thread-#307%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [
        2hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failu
        3reCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: Missing tails [bucket=250, tails=null, metaPage=000120ca00002798]]]
        4java.lang.AssertionError: Missing tails [bucket=250, tails=null, metaPage=000120ca00002798]
        5        at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.updateTail(PagesList.java:624)
        6        at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.mergeNoNext(PagesList.java:1628)
        7        at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.removeDataPage(PagesList.java:1577)
        8        at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:318)
        9        at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:273)
        10        at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:292)
        11        at org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:273)
        12        at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.removeDataRowByLink(AbstractFreeList.java:633)
        13        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:367)
        14        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.lambda$syncMetadata$2(GridCacheOffheapManager.java:288)
        15        at org.apache.ignite.internal.util.IgniteUtils.lambda$wrapIgniteFuture$3(IgniteUtils.java:11665)
        16        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        17        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        18        at java.lang.Thread.run(Thread.java:748)
        

        reproducer: https://github.com/gridgain/apache-ignite/blob/2603e9a01bc1f6033b760ef02ebaba9a8069b84b/modules/core/src/test/java/org/apache/ignite/Reproducer12005.java

      All such exceptions should be passed to DiagnosticProcessor and contain page ids that are possibly corrupted, to be able to abalyze them in PDS.

      Attachments

        Issue Links

          Activity

            People

              Denis Chudov Denis Chudov
              Denis Chudov Denis Chudov
              Sergey Chugunov Sergey Chugunov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m