Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2785

Support more parallel scanners in the backup job

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9.0
    • Fix Version/s: 1.10.0
    • Component/s: None
    • Labels:

      Description

      Currently the KuduBackup job uses 1 scanner and therefore 1 Spark task per Kudu partition. When KUDU-2670 is complete, we should consider and test having more than one scanner per partition and instead configuring a target data size for each scanner. That should result in faster and more reliable/predictable backup jobs regardless of partition count.

      It may however make restoring more difficult because it could cause compactions. Restore side testing and improvements may also be required.

      Improvements to the estimation for key range sizes may also need to be done, so this should be well tested. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                granthenke Grant Henke
                Reporter:
                granthenke Grant Henke
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: