Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5790

Add tests for: VertexRDD's won't zip properly for `diff` capability

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.4.0
    • GraphX, Tests
    • None

    Description

      For VertexRDD's with differing partition sizes one cannot run commands like `diff` as it will thrown an IllegalArgumentException. The code below provides an example:

      import org.apache.spark.graphx._
      import org.apache.spark.rdd._
      val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => (id, id.toInt+1)))
      setA.collect.foreach(println(_))
      val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => (id, id.toInt+2)))
      setB.collect.foreach(println(_))
      val diff = setA.diff(setB)
      diff.collect.foreach(println(_))
      val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2)))
      setA.diff(setC).collect
      // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
      

      Attachments

        Issue Links

          Activity

            People

              boyork Brennon York
              boyork Brennon York
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: