Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2568

RangePartitioner should go through the data only once

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • Spark Core
    • None

    Description

      As of Spark 1.0, RangePartitioner goes through data twice: once to compute the count and once to do sampling. As a result, to do sortByKey, Spark goes through data 3 times (once to count, once to sample, and once to sort).

      RangePartitioner should go through data only once (remove the count step).

      Attachments

        Issue Links

          Activity

            People

              mengxr Xiangrui Meng
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: