Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
According to https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast , we might need to wait for schema availability when fetching a schema. If such waits happen inside a PartitionListener, JRaft threads might be blocked for a noticeable amount of time (maybe even seconds). We should avoid this.
In RW transactions
When a primary node is going to process a request, it waits till it has all the schema versions for the corresponding timestamp (beginTs or commitTs) Top (i.e. that MS SafeTime >= Top). The wait happens outside of JRaft threads. Then it obtains the global schema revision SR of the latest schema update that is not later than the corresponding timestamp. It then builds a command (putting that SR inside) and submits it to RAFT.
When an AppendEntriesRequest is built, Replicator inspects all the entries it includes in it, extracts SRs from each of them, takes max of them (as MSR, for ‘max schema revision’) and puts it in the AppendEntriesRequest.
When the request is processed by a follower/learner, it compares the MSR from the request with its locally known MSR (in the Catalog). If the request’s MSR > local MSR, then the request is rejected (with reason EBUSY). It will be retried by the leader after some time. As an optimization, we might wait for some time in hope that the local MSR catches up with the request’s MSR.
As we need an additional field in AppendEntriesRequest that will only be used by partition groups, we could add a generic container for properties to this interface, like Map<String, Object> extras().
To extract the SR from a command, we might just deserialize it completely, but this requires a lot of work that is not necessary. We might serialize commands having SR in a special way (putting SR in the very first bytes of the message) to make its retrieval effective.
As the primary has already made sure that it has the schema versions needed to execute the command, no waits will be needed on the primary node while executing the RAFT command.
As secondaries/learners refuse AppendEntries which they cannot execute waitless, they will not have to wait at all in JRaft threads.
A case when the RAFT leader is not collocated with the primary is possible. We can add the same validation for ActionRequests: pass the required SR inside an ActionRequest, validate it in ActionRequestProcessor and reject requests having SR above the local MSR.
In RO transactions
When processing an RO transaction, we just wait for MS SafeTime. This is made out of RAFT, so no special measures are needed.
Attachments
Issue Links
- is depended upon by
-
IGNITE-19824 Implicit RO should be used in implicit single gets
- Resolved
-
IGNITE-20012 Raft client freezing on stop
- Resolved
- is related to
-
IGNITE-20012 Raft client freezing on stop
- Resolved
- relates to
-
IGNITE-20256 Refuse to install Raft snapshots on partitions when not enough schemas are available
- Resolved
- links to