Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11432

Personalized PageRank shouldn't use uniform initialization

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.4.2, 1.5.3, 1.6.0
    • Component/s: GraphX
    • Labels:
      None
    • Target Version/s:

      Description

      The current implementation of personalized pagerank in GraphX uses uniform initialization over the full graph - every vertex will get initially activated.

      For example:

      import org.apache.spark._
      import org.apache.spark.graphx._
      import org.apache.spark.rdd.RDD
      val users: RDD[(VertexId, (String, String))] =
        sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
                             (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
      val relationships: RDD[Edge[String]] =
        sc.parallelize(Array(Edge(3L, 7L, "collab"),    Edge(5L, 3L, "advisor"),
                             Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
      val defaultUser = ("John Doe", "Missing")
      val graph = Graph(users, relationships, defaultUser)
      graph.staticPersonalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
      

      Leads to all vertices being set to resetProb (0.15), which is different from the behavior described in SPARK-5854, where only the source node should be activated.

      The risk is that, after a few iterations, the most activated nodes are the source node and the nodes that were untouched by the propagation. For example in the above example the vertex 2L will always have an activation of 0.15:

      graph.personalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
      

      Which leads into a higher score for 2L than for 7L and 5L, even though there's no outbound path from 3L to 2L.

        Attachments

          Activity

            People

            • Assignee:
              yraimond Yves Raimond
              Reporter:
              yraimond Yves Raimond
              Shepherd:
              DB Tsai
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: