## Details

## Description

Pig uses RandomSampleLoader for range partitioning in order-by. But since the sample size is hardcoded as 100, volatility in the variance of the results increases when sorting a large number of rows (e.g. 10M+ per task).

It would be nice if the sample size could be configurable via Pig properties.

Thanks Cheolsoo Park, it would be great if you can share rough numbers on benefits of the setting. That will give us some guidance on configuring this value.