Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7507

Ignite node performance drop during checkpoint start: store metapage eviction causes long checkpoint lock hold time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5
    • persistence
    • None

    Description

      Store metadata Page eviction becomes very expensive operation during checkpoint start.

      These pages reads hands ignite node until metadata will be loaded from disk.

      Following store (paritition) metapages:

      • Partition Metadata Page
      • Freelist Meta Page
      • Partition Counters IO
        required during execution of saveStoreMetadata() & markCheckpointBegin()

      If this page is not available in memory, it is loaded from disk.
      But such loads are done while holding checkpointLock (in write mode).
      Example of timing:

      • checkpointLockWait=75ms, checkpointLockHoldTime=2653ms, pages=696120

      All this time worker threads are not able to put any data to any cache.

      It is required to avoid eviction of such pages (evict it with lowest priority than dirty page).

      (Full stacktrace)

       db-checkpoint-thread-#40%checkpoint.IgniteMassLoadSandboxTest1% Id=63 WAITING	
      	
      at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
      	at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
      	at org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:145)
      	at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.read(AsyncFileIO.java:95)
      	at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:324)
      	at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:306)
      	at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:291)
      	at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:656)
      	at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:576)
      	at org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:130)
      	at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:196)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:168)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3022)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2719)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2644)
      	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
      	at java.lang.Thread.run(Thread.java:748)
      

      Attachments

        Issue Links

          Activity

            People

              dpavlov Dmitry Pavlov
              dpavlov Dmitry Pavlov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: