Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-14684

Stopping node at the end of checkpoint can cause "Critical system error"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.11
    • Component/s: persistence
    • Labels:
      None
    • Release Note:
      Fixed node fail due to deleting DurableBackgroundTask's at the end of a checkpoint when stopping a node.
    • Ignite Flags:
      Release Notes Required

      Description

      Checkpoint listener org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor#afterCheckpointEnd which trigger at the end of checkpoint process can not take checkpoint read lock during node stopping.

      Run test(see exception in log):org.apache.ignite.internal.processors.cache.persistence.db.LongDestroyDurableBackgroundTaskTest#testDestroyTaskLifecycle

      [2021-05-05 15:41:10,907][ERROR][db-checkpoint-thread-#87%db.LongDestroyDurableBackgroundTaskTest0%][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to perform cache update: node is stopping.]]
      class org.apache.ignite.IgniteException: Failed to perform cache update: node is stopping.
      	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:127)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1583)
      	at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.metaStorageOperation(DurableBackgroundTasksProcessor.java:335)
      	at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.afterCheckpointEnd(DurableBackgroundTasksProcessor.java:152)
      	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointEnd(CheckpointWorkflow.java:606)
      	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:479)
      	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:282)
      	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to perform cache update: node is stopping.
      	... 9 more
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ktkalenko@gridgain.com Kirill Tkalenko
                Reporter:
                makedonskaya Maria Makedonskaya
                Reviewer:
                Ivan Bessonov
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m