[ARROW-10582] [Rust] [DataFusion] Implement "repartition" operator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Rust, Rust - DataFusion
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/26545

Description

The repartition operator should read batches from its input partitions and then map that data to its output partitions using a specific partitioning scheme.

The simplest and most efficient partition schema would be a "round robin batch partitioner". For each input batch, it would pick the next output partition to write to. This is a convenient way to change the number of partitions up or down with minimal overhead.

Another example of a partitioning scheme would be a hash partitioner, which computes the hash of the partition keys on each incoming row and then applies a modulus to determine which output partition to write to.

Attachments

Issue Links

links to

GitHub Pull Request #8982

Activity

People

Assignee:: Andy Grove

Reporter:: Andy Grove

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Nov/20 15:33

Updated:: 11/Jan/23 08:14

Resolved:: 24/Dec/20 16:37

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

4h 10m