Here is high level description
PeerSync currently computes versions the node recovery is missing and then sends all the version numbers to a replica to get corresponding updates. When a node under recovery is missing too many updates, the payload of getUpdates goes above 2MB and jetty would reject the request. Problem can be solved using one of the following technique
- Increasing jetty payload limit pay solve this problem. We still would be sending a lot of data over the network, which might not be needed.
- Stream versions to replica while asking for updates.
- Request versions in chunks of about 90K versions at a time
- gzip versions , and unzip it on the other side.
- Ask for version using version ranges instead of sending individual versions.
Approaches 1-3 require sending lot of data over the wire.
Approach #3 also requires making multiple calls. Additionally #3 might not be feasible consider how current code works by submitting requests to shardHandler and calling handleResponse.
#4 may work, but looks a little inelegant.
Hence I settle on approach #5 (suggested by Ramkumar). Here is how it works
- Let's say replica has version [1, 2, 3, 4, 5, 6] and leader has versions [1, 2, 3, 4, 5, 6, 10, -11, 12, 13, 15, 18]
- While recovery using PeerSync strategy, replica computes, that range it is missing is 10...18
- Replica now requests for versions by specifying range 10...18 instead of sending all the individual versions (namely 10,11,-11,12,13,15,18)
- I have made using version ranges for PeerSync configurable, by introducing following configuration section
- Further I have it backwards compatible and a recovering node will use version ranges only if node it asks for updates can process version ranges