[IGNITE-14671] Test IgniteClusterSnapshotCheckTest#testClusterSnapshotCheckOtherCluster is flaky - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Trivial
Resolution: Fixed
Affects Version/s: 2.10
Fix Version/s: 2.11
Component/s: None
Labels:
- iep-43
- ise

Description

To reproduce failure, run it several times, for example, set up IDE to run test with 'repeat until failure' option. Then you will get an assertion error:

java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [a2844419-3081-432a-b611-c4f891900005]

	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
	at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)

With applied patch [1] exception would be as follows:

java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1

	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
	at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)

It seems to be a concurrent update problem of thread unsafe HasSet (see [2, 3]):

Unsafe HashSet

Set<UUID> assigns = new HashSet<>();

Concurrent update

grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new GridMessageListener() {
    @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
        if (msg instanceof GridJobExecuteRequest) {
            GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;

            if (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
                assigns.add(locNodeId);
        }
    }
});

With concurrent Set implementation problem is not reproducing (see patch [4]):

Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

testClusterSnapshotCheckOtherCluster_fix.patch
29/Apr/21 15:45
2 kB
Ilya Shishkov
testClusterSnapshotCheckOtherCluster_printCount.patch
29/Apr/21 15:23
1 kB
Ilya Shishkov

Issue Links

links to

GitHub Pull Request #9068

Activity

People

Assignee:: Ilya Shishkov

Reporter:: Ilya Shishkov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Apr/21 15:34

Updated:: 06/Oct/22 17:26

Resolved:: 30/Apr/21 09:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m