Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-10778

Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.64.0
    • indexing
    • None

    Description

      The current version of the Pipelined download strategy uses a single connection/thread to download from MongoDB. We can further increase the download speed by using an additional MongoDB connection. A Mongo deployment has 1 primary and 2 secondaries, so in principle we could have 1 connection to each secondary, effectively doubling the download speed.

      There are a few points to observe:

      • Connections should go to different secondaries. If both connections go to the same secondary, there's a high change that they will be limited by what a single replica can provide and of overloading that replica. So each secondary should have one and only one connection.
      • How to partition the range of documents to download between two threads. We are already downloading from Mongo in order of (_modified, _id). A simple and effective partition strategy for 2 connections is for one to download in ascending and the other in descending order.

      Attachments

        Issue Links

          Activity

            People

              nfsantos Nuno Santos
              nuno.santos Nuno Santos
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: