Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-19227

Wait for schema availability outside JRaft threads

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-beta2
    • None

    Description

      According to https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast , we might need to wait for schema availability when fetching a schema. If such waits happen inside a PartitionListener, JRaft threads might be blocked for a noticeable amount of time (maybe even seconds). We should avoid this.

      In RW transactions

      When a primary node is going to process a request, it waits till it has all the schema versions for the corresponding timestamp (beginTs or commitTs) Top (i.e. that MS SafeTime >= Top). The wait happens outside of JRaft threads. Then it obtains the global schema revision SR of the latest schema update that is not later than the corresponding timestamp. It then builds a command (putting that SR inside) and submits it to RAFT.

      When an AppendEntriesRequest is built, Replicator inspects all the entries it includes in it, extracts SRs from each of them, takes max of them (as MSR, for ‘max schema revision’) and puts it in the AppendEntriesRequest.

      When the request is processed by a follower/learner, it compares the MSR from the request with its locally known MSR (in the Catalog). If the request’s MSR > local MSR, then the request is rejected (with reason EBUSY). It will be retried by the leader after some time. As an optimization, we might wait for some time in hope that the local MSR catches up with the request’s MSR.

      As we need an additional field in AppendEntriesRequest that will only be used by partition groups, we could add a generic container for properties to this interface, like Map<String, Object> extras().

      To extract the SR from a command, we might just deserialize it completely, but this requires a lot of work that is not necessary. We might serialize commands having SR in a special way (putting SR in the very first bytes of the message) to make its retrieval effective.

      As the primary has already made sure that it has the schema versions needed to execute the command, no waits will be needed on the primary node while executing the RAFT command.

      As secondaries/learners refuse AppendEntries which they cannot execute waitless, they will not have to wait at all in JRaft threads.

      A case when the RAFT leader is not collocated with the primary is possible. We can add the same validation for ActionRequests: pass the required SR inside an ActionRequest, validate it in ActionRequestProcessor and reject requests having SR above the local MSR.

      In RO transactions

      When processing an RO transaction, we just wait for MS SafeTime. This is made out of RAFT, so no special measures are needed.

      Attachments

        Issue Links

          Activity

            People

              rpuch Roman Puchkovskiy
              rpuch Roman Puchkovskiy
              Kirill Tkalenko Kirill Tkalenko
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 20m
                  5h 20m