Description
NullPointerException may be thrown during cluster topology change:
[14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=DynamicCacheChangeBatch [id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false], affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0], discPort=0, order=39, intOrder=27, lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true], topVer=468, nodeId8=6a076901, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, evt=DISCOVERY_CUSTOM_EVT] java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkipJoiningNodes(ExchangeLatchManager.java:327) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1401) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:745)
The original topic on the user-list: http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-server-node-null-pointer-exception-td28899.html
RESOLUTION
It seems that the reason for the issue is a small value of IGNITE_DISCOVERY_HISTORY_SIZE ( smaller than the number of nodes joining/left the cluster simultaneously). I could not reproduce the issue with the default values of TcpDiscoverySpi#topHistSize and IGNITE_DISCOVERY_HISTORY_SIZE. I assume that this property was changed by the user.
So, NullPointerException was changed to IgniteException with the appropriate message which provides a hint to resolve the issue. Perhaps, it would be a good idea to change the implementation of ExchangeLatchManager in the way of using DiscoCache instance instead of AffinityTopologyVersion. This approach has pros and cons, so it requires additional investigation.
Attachments
Attachments
Issue Links
- links to