Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
I noticed that sample() requires data to repartitioned when it's used at the beginning of a series of dataframe commands. In practice we should be able to sample within arbitrary partitions before combining the partitions to produce the final result.
It looks like the root cause is that our sample expressions require partitioning by index, rather than arbitrary partitioning.
Attachments
Issue Links
- links to