Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-15295

Server node that has an empty checkpoint file-XXX-START.bin does not start

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.12
    • persistence
    • None

    Description

      When starting a server node that has an empty checkpoint file-XXX-START.bin this node does not start.

      2021-06-08 16:00:33.383[ERROR][Thread-19][o.a.i.i.IgniteKernal%DPL_GRID%DplGridNodeName] Exception during start processors, node will be stopped and close connections
      2java.nio.BufferUnderflowException: null
      3        at java.nio.Buffer.nextGetIndex(Buffer.java:532)
      4        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:417)
      5        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readPointer(CheckpointMarkersStorage.java:301)
      6        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readCheckpointStatus(CheckpointMarkersStorage.java:218)
      7        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointManager.readCheckpointStatus(CheckpointManager.java:265)
      8        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointStatus(GridCacheDatabaseSharedManager.java:1642)
      9        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:584)
      10        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:2999)
      11        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1205)
      12        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2105)
      13        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1768)
      14        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1147)
      15        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:667)
      16        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:593)
      

      Checkpoint marker is always fully written in the temp file first, and then this file is renamed (see

      org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#writeCheckpointEntry(java.nio.ByteBuffer, org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry, org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntryType, boolean)

      )

      So the root cause of this error is not clear, unless file was changed somehow. We need extended information if such error will happen in future, but in this case we have nothing for analysis (LFS was cleared right after this error happened).

      In the same time we can’t guarantee correctness of work when checkpoint markers are inconsistent. We can’t just ignore them, if they are broken, and can’t recover from previous checkpoint just as simple.

      But it seems reasonable to catch all reading-related exceptions in org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#readPointer.

      Attachments

        Issue Links

          Activity

            People

              Denis Chudov Denis Chudov
              Denis Chudov Denis Chudov
              Sergey Chugunov Sergey Chugunov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m