Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26757

GraphX EdgeRDDImpl and VertexRDDImpl `count` method cannot handle empty RDDs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.1, 2.3.2, 2.4.0
    • Fix Version/s: 2.3.3, 2.4.1, 3.0.0
    • Component/s: GraphX
    • Labels:
      None

      Description

      The EdgeRDDImpl and VertexRDDImpl types provided by GraphX throw an java.lang.UnsupportedOperationException: empty collection exception if count is called on an empty instance, when they should return 0.

      import org.apache.spark.graphx.{Graph, Edge}
      val graph = Graph.fromEdges(sc.emptyRDD[Edge[Unit]], 0)
      graph.vertices.count
      graph.edges.count
      

      Running that code in a spark-shell:

      scala> import org.apache.spark.graphx.{Graph, Edge}
      import org.apache.spark.graphx.{Graph, Edge}
      
      scala> val graph = Graph.fromEdges(sc.emptyRDD[Edge[Unit]], 0)
      graph: org.apache.spark.graphx.Graph[Int,Unit] = org.apache.spark.graphx.impl.GraphImpl@6879e983
      
      scala> graph.vertices.count
      java.lang.UnsupportedOperationException: empty collection
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1011)
        at org.apache.spark.graphx.impl.VertexRDDImpl.count(VertexRDDImpl.scala:90)
        ... 49 elided
      
      scala> graph.edges.count
      java.lang.UnsupportedOperationException: empty collection
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1011)
        at org.apache.spark.graphx.impl.EdgeRDDImpl.count(EdgeRDDImpl.scala:90)
        ... 49 elided
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                huonw Huon Wilson
                Reporter:
                huonw Huon Wilson
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: