Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2568

RangePartitioner should go through the data only once

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      As of Spark 1.0, RangePartitioner goes through data twice: once to compute the count and once to do sampling. As a result, to do sortByKey, Spark goes through data 3 times (once to count, once to sample, and once to sort).

      RangePartitioner should go through data only once (remove the count step).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mengxr Xiangrui Meng
                Reporter:
                rxin Reynold Xin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: