[BEAM-11881] DataFrame subpartitioning order is incorrect - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.29.0
Component/s: dsl-dataframe, sdk-py-core
Labels:
- dataframe-api

Description

Currently we've defined

Nothing() < Index([i]) < Index([i,j]) < .. < Index() < Singleton()

s.t. Singleton is a subpartitoning of Index, is a subpartitioning of Index([i,j]), but this is incorrect. The order should be

Singleton() < Index([i]) < Index([i,j]) < .. < Index() < Nothing()

s.t. every other partitioning is a subpartitioning of Singleton. This is logical, since Singleton will collect the largest amount of data on a single node, partitioning by a single index will be alittle more distributed, and partitioning by the full Index() will be the most distribtued.

Attachments

Issue Links

is related to

BEAM-11628 Implement GroupBy.apply

Triage Needed

links to

GitHub Pull Request #14135

Activity

People

Assignee:: Brian Hulette

Reporter:: Brian Hulette

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Feb/21 00:04

Updated:: 22/Jun/21 19:21

Resolved:: 16/Mar/21 14:08

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

7h 20m