Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8893

Require positive partition counts in RDD.repartition

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • Spark Core
    • None

    Description

      What does sc.parallelize(1 to 3).repartition(p).collect return? I would expect Array(1, 2, 3) regardless of p. But if p < 1, it returns Array(). I think instead it should throw an IllegalArgumentException.

      I think the case is pretty clear for p < 0. But the behavior for p = 0 is also error prone. In fact that's how I found this strange behavior. I used rdd.repartition(a/b) with positive a and b, but a/b was rounded down to zero and the results surprised me. I'd prefer an exception instead of unexpected (corrupt) results.

      I'm happy to send a pull request for this.

      Attachments

        Issue Links

          Activity

            People

              darabos Daniel Darabos
              darabos Daniel Darabos
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: