Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48656

ArrayIndexOutOfBoundsException in CartesianRDD getPartitions

    XMLWordPrintableJSON

Details

    Description

      ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)
      val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)rdd2.cartesian(rdd1).partitions```

      Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We should provide a better error message which indicates the number of partition overflows so it's easier for the user to debug.

      Attachments

        Activity

          People

            Wayne Guo Wei Guo
            n-young-db Nick Young
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: