Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 4.0.0
-
None
-
None
-
None
-
ghx-label-13
Description
Ensure consistency between failure detection and registration/Ack of a coordinator by the admission service.
Currently admission service utilizes the statestore membership updates to detect a coordinator going down but it still services RPCs from that coordinator if it is still up and able to contact the admission service.
Using the current mechanisms of statestore updates(IMPALA-10594), admission heartbeats(IMPALA-10590, IMPALA-10720) and coordinator registration(IMPALA-9976) ensure that consistency is maintained between these mechanism.
A possible implementation is:
- Use statestore as the only source of truth.
- Consistency: Only allow a coord to register if it is registered with the statestore
- Atomicity: If the statestore update signals that a coord is down, remove all its state (running and queued queries) before you allow it to register again
OR
Eventual consistency: We remove queries between subsequent statestore updates and if the coord comes back up and sends the full admission state, we can update the state of that query id if it has not been removed yet (since the full admission state only contains running queries)Cant use this because only changes to the membership initiate the query removal process which would only happen once if a coord is removed.