Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2032

Add an RDD.samplePartitions method for partition-level sampling

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • Spark Core
    • None

    Description

      This would allow us to sample a percent of the partitions and not have to materialize all of them. It's less uniform but much faster and may be useful for quickly exploring data.

      Attachments

        Activity

          People

            prashant Prashant Sharma
            matei Matei Alexandru Zaharia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: