Description
For VertexRDD's with differing partition sizes one cannot run commands like `diff` as it will thrown an IllegalArgumentException. The code below provides an example:
import org.apache.spark.graphx._ import org.apache.spark.rdd._ val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => (id, id.toInt+1))) setA.collect.foreach(println(_)) val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => (id, id.toInt+2))) setB.collect.foreach(println(_)) val diff = setA.diff(setB) diff.collect.foreach(println(_)) val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) setA.diff(setC).collect // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
Attachments
Issue Links
- relates to
-
SPARK-5351 Can't zip RDDs with unequal numbers of partitions in ReplicatedVertexView.upgrade()
- Resolved
-
SPARK-1955 VertexRDD can incorrectly assume index sharing
- Resolved
- links to