Description
Pig uses RandomSampleLoader for range partitioning in order-by. But since the sample size is hardcoded as 100, volatility in the variance of the results increases when sorting a large number of rows (e.g. 10M+ per task).
It would be nice if the sample size could be configurable via Pig properties.