Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26757

GraphX EdgeRDDImpl and VertexRDDImpl `count` method cannot handle empty RDDs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.1, 2.3.2, 2.4.0
    • 2.3.3, 2.4.1, 3.0.0
    • GraphX
    • None

    Description

      The EdgeRDDImpl and VertexRDDImpl types provided by GraphX throw an java.lang.UnsupportedOperationException: empty collection exception if count is called on an empty instance, when they should return 0.

      import org.apache.spark.graphx.{Graph, Edge}
      val graph = Graph.fromEdges(sc.emptyRDD[Edge[Unit]], 0)
      graph.vertices.count
      graph.edges.count
      

      Running that code in a spark-shell:

      scala> import org.apache.spark.graphx.{Graph, Edge}
      import org.apache.spark.graphx.{Graph, Edge}
      
      scala> val graph = Graph.fromEdges(sc.emptyRDD[Edge[Unit]], 0)
      graph: org.apache.spark.graphx.Graph[Int,Unit] = org.apache.spark.graphx.impl.GraphImpl@6879e983
      
      scala> graph.vertices.count
      java.lang.UnsupportedOperationException: empty collection
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1011)
        at org.apache.spark.graphx.impl.VertexRDDImpl.count(VertexRDDImpl.scala:90)
        ... 49 elided
      
      scala> graph.edges.count
      java.lang.UnsupportedOperationException: empty collection
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1031)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1031)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1011)
        at org.apache.spark.graphx.impl.EdgeRDDImpl.count(EdgeRDDImpl.scala:90)
        ... 49 elided
      

      Attachments

        Issue Links

          Activity

            People

              huonw Huon Wilson
              huonw Huon Wilson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: