Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-8879

Blinking baseline node sometimes unable to connect to cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.5
    • 2.8
    • None
    • None

    Description

      Almost the same scenario as in IGNITE-8874 but node left baseline while blinking

      All caches with 2 backups
      4 nodes in cluster

      1. Start cluster, load data
      2. Start transactional loading (8 threads, 100 ops/second put/get in each op)
      3. Repeat 10 times: kill one node, remove from baseline, start node again (with no LFS clean), wait for rebalance
      4. Check idle_verify, check data corruption

       

      At some point killed node unable to start and join cluster because of error

      (Attachments info: grid.1.node2.X.log - blinking node logs, X - iteration counter from step 3)

      080ee8-END.bin]
      [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
      [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
      [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, pos=FileWALPointer [idx=0, fileOff=583691, len=119]]
      [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: EOF at position [1000000] expected to read [1] bytes, ptr=FileWALPointer [idx=0, fileOff=1000000, len=0]
      [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state : {Index=3602879702215753728,Offset=775434544} ]
      [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
      [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: EOF at position [1000000] expected to read [1] bytes, ptr=FileWALPointer [idx=0, fileOff=1000000, len=0]
      [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state : {Index=3602879702215753728,Offset=775434544} ]
      [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=101ms]
      [2018-06-26 19:01:43,450][INFO ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for BaselineTopology[id=4]
      [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start processors, node will be stopped and close connections
      class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter []
              at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
              at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
              at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
              at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
              at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
              at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
              at org.apache.ignite.Ignition.start(Ignition.java:352)
              at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
      Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=1
              at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
              at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
              at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381)
              at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643)
              at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486)
              at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
              at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700)
              at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1766)
              ... 11 more
      [2018-06-26 19:01:43,456][ERROR][main][IgniteKernal] Got exception while starting (will rollback startup routine).
      class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter []
              at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
              at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
              at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
              at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
              at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
              at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
              at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
              at org.apache.ignite.Ignition.start(Ignition.java:352)
              at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
      Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=1
              at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
              at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
              at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381)
              at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643)
              at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486)
              at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
              at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700)
              at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1766)
              ... 11 more

      Attachments

        1. IGNITE-8879.zip
          59 kB
          Dmitry Sherstobitov

        Activity

          People

            ivandasch Ivan Daschinsky
            qvad Dmitry Sherstobitov
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m