Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As a first step to estimate how much faster the file-rebalancing may be, I suggest to implement a simple partition fetch procedure via the communication SPI extension:
1) Node A sends a partition fetch request to node B
2) Node B starts a checkpoint and creates a local copy of the partition. Note that during the partition copy there might be concurrent ongoing checkpoints, this must be handled properly
3) Node B establishes a new TCP connection on the TCP communication port (handshake and verification is assumed)
4) Node B calls transferFile (or native analogue, investigation needed) to send the partition file in the most effective way
5) Node A writes the file to a specified location on the local file system
After this mechanics is implemented, we need to hack the rebalance code and use partition fetch logic instead of regular rebalance to measure
1) How much faster (or slower) the new approach performs
2) How it affects the concurrent transactions in the grid
Attachments
Issue Links
- links to