Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7731

ClassCastException at restarted node if killed during checkpoint

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3
    • None
    • None
    • None

    Description

      During failover test restarted node fails to start with the following exception:

      [2018-02-15 12:17:46,388][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p200000-r400000-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin, endMarker=null]
      [2018-02-15 12:17:46,389][INFO ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158]
      [2018-02-15 12:17:46,389][WARN ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start.
      [2018-02-15 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], nodeId=b8684b5c, evt=NODE_JOINED]
      java.lang.ClassCastException: org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast to org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx
       at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595)
       at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533)
       at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568)
       at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724)
       at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611)
       at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
       at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
       at java.lang.Thread.run(Thread.java:748)
      
      

      This happens if node was killed during checkpoint (it seems only during the first one).

      Load conifg:

      • Yardstick with CacheRandomOperationBenchmark
      • 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also reproduced when restarted node is 1 per host.
      • Several caches with different configs: pds/in memory, tx/atomic, with/without eviction etc. No dynamic caches. Complete configs are attached.
      • 1 node is restarted periodically.

      Logs of restarted node are attached.

       

      Attachments

        1. run-load.properties
          4 kB
          Ksenia Rybakova
        2. run-load.xml
          2 kB
          Ksenia Rybakova
        3. ignite-base-load-config.xml
          24 kB
          Ksenia Rybakova
        4. 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log
          64 kB
          Ksenia Rybakova
        5. 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log
          173 kB
          Ksenia Rybakova

        Issue Links

          Activity

            People

              Unassigned Unassigned
              krybakova Ksenia Rybakova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: