Description
To reproduce failure, run it several times, for example, set up IDE to run test with 'repeat until failure' option. Then you will get an assertion error:
java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [a2844419-3081-432a-b611-c4f891900005]
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
With applied patch [1] exception would be as follows:
java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
It seems to be a concurrent update problem of thread unsafe HasSet (see [2, 3]):
Unsafe HashSet
Set<UUID> assigns = new HashSet<>();
Concurrent update
grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new GridMessageListener() { @Override public void onMessage(UUID nodeId, Object msg, byte plc) { if (msg instanceof GridJobExecuteRequest) { GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg; if (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName())) assigns.add(locNodeId); } } });
With concurrent Set implementation problem is not reproducing (see patch [4]):
Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
Attachments
Attachments
Issue Links
- links to