Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
IEP-98 states:
When creating a message M telling the cluster about a schema update activation moment, choose the message timestamp Tm (moving safeTime forward) equal to Now, but assign Tu (activation moment) contained in that M to be Tm+DD
This is hard to achieve.
Problem
We need Tu==Tm+DD. Right now, with what we have in IGNITE-19028, it's not straightforward. This is because we have too many actors:
- There's a client, that chooses Tu, because it's the only actor that can affect message content.
- There's a meta-storage lease-holder, or leader, that chooses Tm.
- There's everybody else, who expect a correspondence between Tu and Tm.
First two actors are important, because they have independent clocks, but must coordinate the same event. This is impossible with described protocol.
Discussion
Let's consider these two solutions:
- Client generates Tm.
- Meta-storage generates Tu.
Option 1 is out of question, there must be only a single node at any given moment in time, that's responsible for the linear order of time in messages.
What about option 2? Since meta-storage doesn't know anything about commands semantics, it can't really generate any data. So this solution doesn't work either.
Solution
Combined solution could be the following:
- Client sends DD as part of the command (this is not a constant, user can configure it, if they really feel like doing it)
- Meta-storage generates Tm
- Every node, upon receiving the update, calculates Tu
This could work, if nodes would have never been restarted. There's one problem that needs to be solved: recovering the values of Tm from the (old) data upon node restart.
This can be achieved by persisting safeTime along with revision as a part of metadata, that can be retrieved back through the meta-storage service API.
In other words:
1. Client sends
schema.latest = 5 schema.5.data = ... schema.5.dd = 30s
2. Lease-holder adds meta-data to the command:
safeTime = 10:10
3. Meta-storage listener writes the data:
revision = 33 schema.latest = 5 schema.5.data = ... schema.5.dd = 30s revision.33.safeTime = 10:10:00
How can you read Tu:
- read "schema.5.dd";
- read its revision, it's 33;
- read a timestamp of revision 33 via specialized API;
- add two values together.
Implications and restrictions
There's a cleanup process in the meta-storage. It will eventually remove any "revision.x.safeTime" values, because corresponding revision became obsolete.
But, we should somehow preserve timestamps of revisions that are used by schemas. Such behaviour can be achieved, if components can reserve a revision, and meta-storage can't compact it unless the reservation has been revoked.
Attachments
Issue Links
- is fixed by
-
IGNITE-19532 Introduce happends before relation between local meta storage safe time publication and completion of corresponding meta storage listners
- Resolved