Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4640

FixedRangePartitioner for partitioning items with a known range

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Spark Core
    • None

    Description

      For the large datasets I work with, it is common to have light-weight keys and very heavy values (integers and large double arrays for example). The key values are however known and unchanging. It would be nice if Spark had a built in partitioner which could take advantage of this. A FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal. Furthermore this partitioner type could be extended to a PartitionerWithKnownKeys that had a getAllKeys function allowing for a list of keys to be obtained without querying through the entire RDD.

      Attachments

        Activity

          People

            Unassigned Unassigned
            skicavs Kevin Mader
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: