Description
The current version of the Pipelined download strategy uses a single connection/thread to download from MongoDB. We can further increase the download speed by using an additional MongoDB connection. A Mongo deployment has 1 primary and 2 secondaries, so in principle we could have 1 connection to each secondary, effectively doubling the download speed.
There are a few points to observe:
- Connections should go to different secondaries. If both connections go to the same secondary, there's a high change that they will be limited by what a single replica can provide and of overloading that replica. So each secondary should have one and only one connection.
- How to partition the range of documents to download between two threads. We are already downloading from Mongo in order of (_modified, _id). A simple and effective partition strategy for 2 connections is for one to download in ascending and the other in descending order.
Attachments
Issue Links
- is blocked by
-
OAK-10808 PipelinedMongoConnectionFailureIT should not fail if Mongo is not available
- Closed
- links to