Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3648

Make the sample size for RandomSampleLoader configurable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.13.0
    • impl
    • None

    Description

      Pig uses RandomSampleLoader for range partitioning in order-by. But since the sample size is hardcoded as 100, volatility in the variance of the results increases when sorting a large number of rows (e.g. 10M+ per task).

      It would be nice if the sample size could be configurable via Pig properties.

      Attachments

        1. PIG-3648-1.patch
          3 kB
          Cheolsoo Park

        Activity

          People

            cheolsoo Cheolsoo Park
            cheolsoo Cheolsoo Park
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: