Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9961

RestoreCore needs the option to download files in parallel.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 6.2.1
    • Fix Version/s: None
    • Component/s: Backup/Restore
    • Labels:
      None

      Description

      My backup to cloud storage (Google cloud storage in this case, but I think this is a general problem) takes 8 minutes ... the restore of the same core takes hours. The restore loop in RestoreCore is serial and doesn't allow me to parallelize the expensive part of this operation (the IO from the remote cloud storage service). We need the option to parallelize the download (like distcp).

      Also, I tried downloading the same directory using gsutil and it was very fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.

      Here's a very rough patch that does the parallelization. We may also want to consider a two-step approach: 1) download in parallel to a temp dir, 2) perform all the of the checksum validation against the local temp dir. That will save round trips to the remote cloud storage.

        Attachments

        1. SOLR-9961.patch
          24 kB
          Mikhail Khludnev
        2. SOLR-9961.patch
          15 kB
          Mikhail Khludnev
        3. SOLR-9961.patch
          18 kB
          Mikhail Khludnev
        4. SOLR-9961.patch
          15 kB
          Timothy Potter
        5. SOLR-9961.patch
          7 kB
          Timothy Potter

          Issue Links

            Activity

              People

              • Assignee:
                mkhl Mikhail Khludnev
                Reporter:
                thelabdude Timothy Potter
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: