[KUDU-2785] Support more parallel scanners in the backup job - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.0
Fix Version/s: 1.10.0
Component/s: None
Labels:
- backup

Target Version/s:

1.10.0

Description

Currently the KuduBackup job uses 1 scanner and therefore 1 Spark task per Kudu partition. When KUDU-2670 is complete, we should consider and test having more than one scanner per partition and instead configuring a target data size for each scanner. That should result in faster and more reliable/predictable backup jobs regardless of partition count.

It may however make restoring more difficult because it could cause compactions. Restore side testing and improvements may also be required.

Improvements to the estimation for key range sizes may also need to be done, so this should be well tested.

Attachments

Issue Links

depends upon

KUDU-2670 Splitting more tasks for spark job, and add more concurrent for scan operation

Open

Activity

People

Assignee:: Grant Henke

Reporter:: Grant Henke

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Apr/19 13:49

Updated:: 07/Jun/19 00:03

Resolved:: 07/Jun/19 00:03