Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
Motivation
The logic of setting safeTime explicitly prohibits setting a larger time ahead of a smaller one. In other words, all data updates within storages should be strictly ordered by the safeTime associated with such updates. Currently it's not true:
- We associate update and safe time during update command creation (see org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener)
UpdateCommandBuilder bldr = MSG_FACTORY.updateCommand() ... .safeTimeLong(hybridClock.nowLong());
- However, neither applying a given command locally nor sending it to the raft isn't linearized with associated safeTime value. In other words, it's possible that we will assign t0 to the cmd0 and t1 to the cmd1 but will apply cmd1 prior to cmd0 locally.
Simply speaking, we lack some sort of synchronization here.
Definition of Done
- It's required to linearize updates application to preserve guarantees of the monotonicity of a safeTime's adjustment.
Implementation Notes
Different options are possible:
- We may reject a command that is associated with safeTime < already applied one. Such approach requires
-
- To resend the command with new safeTime in case of 1pc.
-
- Adjust local safeTime, and resend command with new safe time in case of 2pc.
- Add proper synchronization both on client and server side.
- Send pending safeTime instances with each command. More details below:
Let’s assume that there were two updateCommands cmd1(safeTime: t1) and cmd2(safeTime: t2). Let’s also assume that cmd2 was send prior to cmd2 (meaning that it was reordered). In that case, assuming that cmd2 has both t1 and t2 within its data bag, it will wait for cmd1 to bring it data in a queue or formally it will wait previous commands to apply themselves.
Attachments
Issue Links
- causes
-
IGNITE-20716 Partial data loss after node restart
- Resolved
-
IGNITE-20577 Partial data loss after node restart
- Closed
-
IGNITE-20441 ItRebalanceRecoveryTest is flaky
- Resolved
- relates to
-
IGNITE-20834 SQL query may hang forerver after node restart
- Closed
- links to