Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12760

Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9
    • None
    • None

    Description

      Following assertion error triggers failure handler and crashes the node. Can possibly crash the whole cluster.

      2020-02-18 14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager] Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce, msg=GridCacheQueryRequest \[id=178, cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module, type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null, rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false, incMeta=false, all=false, keepBinary=true, subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1, topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1, receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400, super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean \[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED, userVer=0, locDepOwner=false, participants=null], lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6], err=null, skipPrepare=false]]]]
      java.lang.AssertionError: null
      at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:918)
      at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:889)
      at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422)
      at org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576)
      at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584)
      at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386)
      at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312)
      at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102)
      at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301)
      at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565)
      at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189)
      at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130)
      at org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

      There is no fair reproducer for now, but it seems that we should prevent such situation in general like following:
      1) check the correctness of the message before it will be sent - inside of GridCacheDeploymentManager#prepare. If we have the corresponding class loader on local node, we can try to fix message and replace wrong class loader with local one.
      2) log suspicious deployments which we receive from GridDeploymentManager#deploy - maybe we have obsolete deployments in caches.
      3) possibly we can remove this assertion, we should have this class on sender node and use it as class loader id, and if we don't, we will receive exception on finishUnmarshall (Failed to peer load class) and try to process this situation with GridCacheIoManager#processFailedMessage.

      Attachments

        Issue Links

          Activity

            People

              Denis Chudov Denis Chudov
              Denis Chudov Denis Chudov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m