[IGNITE-8020] Rebalancing for persistent caches should transfer file store over network instead of using existing supply/demand protocol - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: persistence
Labels:
- iep-16
- iep-28

Description

Existing rebalancing protocol is suitable for in-memory data storage, but for data persisted in files it is sub-optimal and requires a lot of unnecessary steps. Efforts to optimize it led to necessity to completely rework the protocol - instead of sending batches (SupplyMessages) with cache entries it is possible to send data files directly.

The algorithm should look like this:
1. Demander node sends requests with required partition IDs (like now)
2. Supplier node receives request and performs a checkpoint.
3. After checkpoint is done, supplier sends files with demanded partitions using low-level NIO API.
4. During steps 2-3, demander node should work in special mode - it should temporary store all incoming updates in such way that they can be quickly applied later.
5. After files are transferred, demander applies updates stored at step 4.

The tricky part here is to switch work modes of demander node avoiding all possible race conditions. Also, the aforementioned algorithm should be extended to transfer or rebuild query indexes.