Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-15429

Taking a snapshot may increase the PME execution time by the checkpointFrequency interval.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.11
    • None
    • Release Notes Required

    Description

      When a snapshot is taken, a checkpoint is forced on all server nodes.

      In a rare case, when forcing a checkpoint, the start of the snapshot operation is set to the planned (instead of the current) checkpoint. In this case, the local partition exchange future does not finish until the next checkpoint starts (by timeout), which significantly increases the execution time of the exchange.

      Log output on a node with a checkpoint frequency of 60 seconds (stage="Exchange done" (61542 ms)).

      2021-08-31 23:30:04.792 [INFO ][exchange-worker-#179][org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotFutureTask] Snapshot operation is scheduled on local node and will be handled by the checkpoint listener [sctx=SnapshotFutureTask [pageStore=GridCacheSharedManagerAdapter [starting=true, stop=false], srcNodeId=a49f4c59-a4d1-4b02-b416-ceede4ffc0ba, snpName=20210831233001_snapshot, tmpSnpWorkDir=/opt/ignite/ssd/data/something/snp/20210831233001_snapshot, locBuff=java.lang.ThreadLocal$SuppliedThreadLocal@6fd06d5f, ioFactory=org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory@54067b77, cpEndFut=java.util.concurrent.CompletableFuture@382a989[Not completed], startedFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=378462544], tmpConsIdDir=/opt/ignite/ssd/data/something/snp/20210831233001_snapshot/db/something, closeFut=null, err=null, started=true], topVer=AffinityTopologyVersion [topVer=515, minorTopVer=2]]
      2021-08-31 23:30:05.444 [INFO ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=b0e9b43d-02f4-44fb-90c5-41dc6c294248, startPtr=FileWALPointer [idx=13473, fileOff=1008077757, len=236399], checkpointBeforeLockTime=352ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=325ms, checkpointLockHoldTime=331ms, walCpRecordFsyncDuration=3ms, writeCheckpointEntryDuration=0ms, splitAndSortCpPagesDuration=8ms,  pages=15001, reason='timeout']
      2021-08-31 23:30:05.671 [INFO ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] Checkpoint finished [cpId=b0e9b43d-02f4-44fb-90c5-41dc6c294248, pages=15001,markPos=FileWALPointer [idx=13473, fileOff=1008077757, len=236399], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=343ms, pagesWrite=128ms, fsync=99ms, total=922ms]
      
      ...60 seconds later...
      
      2021-08-31 23:31:05.779 [INFO ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotSender] Resolved snapshot work directory: /opt/ignite/sas/snapshot/20210831233001_snapshot/db/something
      2021-08-31 23:31:05.812 [INFO ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotFutureTask] Submit partition processing tasks with partition allocated lengths: ...
      
      2021-08-31 23:31:05.837 [INFO ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=328ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=298ms, checkpointLockHoldTime=304ms, reason='timeout']
      ...
      2021-08-31 23:31:06.315 [INFO ][exchange-worker-#179][org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=1bd343d0-d6b5-4ea3-880b-ebda77bc48d7, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=515, minorTopVer=2], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=a49f4c59-a4d1-4b02-b416-ceede4ffc0ba, consistentId=epk_rb_sylvanas40.ca.sbrf.ru, addrs=ArrayList [10.126.13.32, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500, sylvanas40.ca.sbrf.ru/10.126.13.32:47500], discPort=47500, order=60, intOrder=60, lastExchangeTime=1626477723350, loc=false, ver=2.9.0#20210705-sha1:b32c94b2, isClient=false], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=515, minorTopVer=2]]
      2021-08-31 23:31:06.315 [INFO ][exchange-worker-#179][org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture] Exchange timing [startVer=AffinityTopologyVersion [topVer=515, minorTopVer=2], resVer=AffinityTopologyVersion [topVer=515, minorTopVer=2], stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters initialization" (1 ms), stage="Determine exchange type" (24 ms), stage="Preloading notification" (0 ms), stage="Wait partitions release [latch=exchange]" (0 ms), stage="Wait partitions release latch [latch=exchange]" (40 ms), stage="Wait partitions release [latch=exchange]" (0 ms), stage="After states restored callback" (0 ms), stage="WAL history reservation" (12 ms), stage="Exchange done" (61542 ms), stage="Total time" (61619 ms)]
      

      Attachments

        Activity

          People

            xtern Pavel Pereslegin
            xtern Pavel Pereslegin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m