Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17877

Can not checkpoint connectedComponents resulting graph

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 1.5.2, 1.6.2, 2.0.1, 2.3.0
    • None
    • GraphX

    Description

      The following code demonstrates the issue

      import org.apache.spark.graphx._
      val users = sc.parallelize(List(3L -> "lucas", 7L -> "john", 5L -> "matt", 2L -> "kelly"))
      val rel = sc.parallelize(List(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
      sc.setCheckpointDir("/tmp/check")
      
      val g = Graph(users, rel)
      g.checkpoint   // /tmp/check/b1f46ba5-357a-4d6d-8f4d-411b64b27c2f appears
      
      val gg = g.connectedComponents
      gg.checkpoint
      
      gg.vertices.collect
      gg.edges.collect
      gg.isCheckpointed  // res5: Boolean = false,   /tmp/check still contains only 1 folder b1f46ba5-357a-4d6d-8f4d-411b64b27c2f
      

      I think the last line should return true instead of false

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              apivovarov Alexander Pivovarov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: