[IGNITE-21565] ReplicasSafeTimePropagationTest#testSafeTimeReorderingOnLeaderReElection is flaky - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0
Component/s: None
Labels:
- ignite-3

Epic Link:
Data Consistency TC Green Again Backlog

Description

java.lang.AssertionError: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException  at org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.matchesSafely(CompletableFutureMatcher.java:78)  at org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.matchesSafely(CompletableFutureMatcher.java:35)  at org.hamcrest.TypeSafeMatcher.matches(TypeSafeMatcher.java:67)  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:10)  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)  at org.apache.ignite.distributed.ReplicasSafeTimePropagationTest.sendSafeTimeSyncCommand(ReplicasSafeTimePropagationTest.java:231)

https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleTable/7867487?expandBuildDeploymentsSection=false&hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandBuildChangesSection=true&expandBuildTestsSection=true&expandBuildProblemsSection=true

Upd #1

SafeTimeReorderingException occurred because of the race between maxObservableSafeTime update on leader election and SafeTimeSyncCommands processing within onBeforeApply. In order to fix that

I've added synchronous onBeforeLeaderStart callback that is now used to update maxObservableSafeTime with clock.now() +CLOCK_SKEW insread of previously used asynchronous onLeaderStart. By asynchronous here I mean asynchronous to onBeforeApply.
I've also added maxObservableSafeTime update to Long.MAX_VALUE on each leader stop along with same initial value. MAX_LONG is greater than any possible safeTime propagated with commands thus we will actually block any SafeTimeSyncCommands processing ({}even within onBeforeApply{}) before leader election that includes proper maxObservableSafeTime initialization.

Attachments

Issue Links

blocks

IGNITE-21639 AssertionError: "Safe time reordering detected" after a massive data load in 3-node cluster

Reopened

causes

IGNITE-23001 ItEstimatedSizeTest.testEstimatedSizeAfterScaleUp is unstable

Resolved

links to

GitHub Pull Request #4552

Activity

People

Assignee:: Alexander Lapin

Reporter:: Alexander Lapin

Reviewer:: Mirza Aliev

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Feb/24 06:33

Updated:: 15/Oct/24 15:38

Resolved:: 15/Oct/24 15:38

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 50m