Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7540

Sequential checkpoints cause overwrite of already cleaned & freed offheap page

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.5
    • Component/s: persistence
    • Labels:
      None

      Description

      The sequence of events as follows:

      in GridCacheProcessor.onExchangeDone(), sharedCtx.database().waitForCheckpoint("caches stop") is peformed and then cache is destroyed and all its pages are freed and cleared asynchronously.

      However, it is entirely possible that after waitForCheckpoint(), next checkpoint will start immediately. It is typical when a lot of data being loaded into Ignite, leading to rapid checkpoint buffer depletion, as well as with artificially increased checkpoint frequency, as used in reproducer.

      Then, checkpointer will save (overwrite) metadata page:

          at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1330)
          at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:428)
          at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:422)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:375)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:163)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:2309)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2088)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2013)
          at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
          at java.lang.Thread.run(Thread.java:748)

      This will happen after cache is already destroyed and even after the page is already zeroed by PageMemoryImpl$ClearSegmentRunnable.run().

      Then, some new cache is being created, and in GridCacheOffheapManager$GridCacheDataStore.getOrAllocatePartitionMetas(), pageMem.acquirePage() will return this page, expected zeroed, but actually containing metadata for old cache's partition. Then, type == PageIO.T_PART_META check will return true and the following exception is issued, leading to cache state inconsistency and data loss:

      Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted)
          at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83)
          at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95)
          at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.init(PagesList.java:175)
          at org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.<init>(FreeListImpl.java:370)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore$1.<init>(GridCacheOffheapManager.java:932)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:929)
          at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1295)
          at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:344)
          at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3191)
          at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2571)
          at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2096)
          at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:140)
          at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:397)
          at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:302)
          at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:59)
          at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:89)
          ... 6 more

        Attachments

        1. IgnitePdsDestroyCacheTest.java
          6 kB
          Ilya Kasnacheev

          Issue Links

            Activity

              People

              • Assignee:
                agoncharuk Alexey Goncharuk
                Reporter:
                ilyak Ilya Kasnacheev
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: