Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2025

EdgeRDD persists after pregel iteration



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0, 1.0.1
    • 1.0.1, 1.1.0
    • GraphX
    • RHEL6 on local and on spark cluster


      Symptoms: During execution of a pregel script/function a copy of an intermediate EdgeRDD object persists after each iteration as shown by the Spark WebUI - storage.

      This is like a memory leak that affects in the Pregel function.

      For example, after the first iteration I will have an EdgeRDD in addition to the EdgeRDD and VertexRDD that are kept for the next iteration. After 15 iterations I will have 15 EdgeRDDs in addition to the current/correct state represented by a single set of 1 EdgeRDD and 1 VertexRDD.

      At the end of a Pregel loop the old EdgeRDD and VertexRDD are unpersisted, but there seems to be another EdgeRDD that is created somewhere that does not get unpersisted.

      i think this is from the replicateVertex function, but I cannot be sure.

      Update - Dave Ankur says, in comments on SPARK-2011 -

      ... is a bug introduced by https://github.com/apache/spark/pull/497.
      It occurs because unpersistVertices used to unpersist both the vertices and the replicated vertices, but after unifying replicated vertices with edges, there was no way to unpersist only one of them. I think the solution is just to unpersist both the vertices and the edges in Pregel.




            ankurd Ankur Dave
            tweninge Tim Weninger
            0 Vote for this issue
            2 Start watching this issue