[PIG-3648] Make the sample size for RandomSampleLoader configurable - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: impl
Labels:
None

Description

Pig uses RandomSampleLoader for range partitioning in order-by. But since the sample size is hardcoded as 100, volatility in the variance of the results increases when sorting a large number of rows (e.g. 10M+ per task).

It would be nice if the sample size could be configurable via Pig properties.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-3648-1.patch
06/Jan/14 01:31
3 kB
Cheolsoo Park

Activity

People

Assignee:: Cheolsoo Park

Reporter:: Cheolsoo Park

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Jan/14 01:27

Updated:: 07/Jul/14 18:07

Resolved:: 18/Feb/14 19:24