Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.3
-
None
-
None
-
None
Description
From time to time the Ignite cluster with services throws next exception during restarting of some nodes:
java.lang.IllegalStateException: Getting affinity for topology version earlier than affinity is calculated [locNode=TcpDiscoveryNode [id=c770dbcf-2908-442d-8aa0-bf26a2aecfef, addrs=[10.44.162.169, 127.0.0.1], sockAddrs=[clrv0000041279.ic.ing.net/10.44.162.169:56500, /127.0.0.1:56500], discPort=56500, order=11, intOrder=8, lastExchangeTime=1520931375337, loc=true, ver=2.3.3#20180213-sha1:f446df34, isClient=false], grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], head=AffinityTopologyVersion [topVer=15, minorTopVer=0], history=[AffinityTopologyVersion [topVer=11, minorTopVer=0], AffinityTopologyVersion [topVer=11, minorTopVer=1], AffinityTopologyVersion [topVer=12, minorTopVer=0], AffinityTopologyVersion [topVer=15, minorTopVer=0]]]
Looks like the reason of this issue is the data race in GridServiceProcessor class.
How to reproduce:
1)To simulate data race you should update next place in source code:
Class: GridServiceProcessor
Method: @Override public void onEvent(final DiscoveryEvent evt, final DiscoCache discoCache) {
Place:
....
try {
svcName.set(dep.configuration().getName());
ctx.cache().internalCache(UTILITY_CACHE_NAME).context().affinity().
affinityReadyFuture(topVer).get();
//HERE (between GET and REASSIGN) you should add Thread.sleep(100) for example.
//try
{ //Thread.sleep(100); //}//catch (InterruptedException e1)
{ //e1.printStackTrace(); //} reassign(dep, topVer);
}
catch (IgniteCheckedException ex)
...
2)After that you should imitate start/shutdown iterations. For reproducing I used GridServiceProcessorBatchDeploySelfTest (but timeout on future.get should be increased to avoid timeout error)
Attachments
Issue Links
- is related to
-
IGNITE-11465 Multiple client leave/join events may wipe affinity assignment history and cause transactions fail
- Resolved
-
IGNITE-12014 Getting affinity for topology version earlier than affinity is calculated for system cache
- Resolved