[KUDU-2786] Parallelize tables for backup and restore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.0
Fix Version/s: 1.10.0
Component/s: None
Labels:
- backup

Description

Currently the backup and restore jobs process tables serially. This works well to ensure resources aren't over allocated upfront, but could be less performant for cases where there are many small tables. Instead we could parallelize the Spark jobs for each table.

It should be straightforward to use Scala futures to run multiple jobs in parallel and check their status. We could add a configuration to cap the maximum number of tables run at the same time, though maybe that isn't really needed.

Attachments

Issue Links

relates to

KUDU-2787 Allow single table failures for backup and restore

Resolved

Activity

People

Assignee:: William Berkeley

Reporter:: Grant Henke

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Apr/19 15:20

Updated:: 06/Jun/19 14:36

Resolved:: 06/Jun/19 14:36