Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-8320

Page corruption during the rebalancing cache.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4
    • 2.5
    • persistence
    • None

    Description

      Cache rebalance may result in page memory corruption.

      [2018-04-18T14:33:23,260][ERROR][sys-#54][GridCacheIoManager] Failed processing message [senderId=95f06c25-e6bb-48f7-a3e5-4c05fc1c49be, msg=GridDhtPartitionSupplyMessage [rebalanceId=37, topVer=AffinityTopologyVersion [topVer=53, minorTopVer=1], missed=null, clean=null, msgSize=525350, estimatedKeysCnt=1690216, size=2, parts=[1, 2], super=GridCacheGroupIdMessage [grpId=-1831596270]]]
       org.apache.ignite.IgniteException: Runtime failure on row: Row@33b6805c[ key: xxxx [idHash=773709078, hash=-630455542, ...], val: xxxx [idHash=1309051286, hash=-1321165334, ver: GridCacheVersion [topVer=135435024, order=1523963943331, nodeOrder=4] ]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2102) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2049) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:247) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:454) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:653) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1866) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:407) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:1391) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1255) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1451) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:352) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3527) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2735) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.preloadEntry(GridDhtPartitionDemander.java:823) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:704) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:347) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:365) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:355) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:99) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1603) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:126) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2751) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1515) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:126) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1484) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
       Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted)
       at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2RowFactory.getRow(H2RowFactory.java:61) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2Tree.createRowFromLink(H2Tree.java:149) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.io.H2LeafIO.getLookupRow(H2LeafIO.java:67) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.io.H2LeafIO.getLookupRow(H2LeafIO.java:33) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:167) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:46) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.getRow(BPlusTree.java:4436) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2Tree.compare(H2Tree.java:209) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.query.h2.database.H2Tree.compare(H2Tree.java:46) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:4423) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:4343) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1500(BPlusTree.java:82) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:270) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4770) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4755) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:158) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:320) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2317) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2069) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
       ... 30 more
      

      Possible cause and reproducer:
      1) Start partition eviction
      2) Force kill node (kill -9) after partition file truncate
      3) Start node again and iterate over index

      The main problem that file truncation is not synchronized with actual checkpoint which can lead to the situation, that after crash recovery we have links in index tree to the data pages which were already removed during file truncation.
      One of the possible solutions is to mark such partition files for deletion and safely truncate them on the next checkpoint.

      This mechanism can be ressurected from ignite-2.0.2.b1 branch.
      See

      org/gridgain/grid/internal/processors/cache/database/GridCacheDatabaseSharedManager.java:3059
      org.gridgain.grid.cache.db.GridCacheOffheapManager#destroyCacheDataStore
      

      Attachments

        Activity

          People

            jokser Pavel Kovalenko
            slava.koptilin Vyacheslav Koptilin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: