Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20429

[GRAPHX] Strange results for personalized pagerank if node is involved in a cycle

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0
    • None
    • GraphX

    Description

      I'm trying to run the personalized PageRank implementation of GraphX on a simple test graph, which is the following:

      Image: https://i.stack.imgur.com/JDv1l.jpg

      I'm a bit confused on some results that I get when I try to compute the PPR for a node that is involved in a cycle. For example, the final output for the node 12 is as follows:

      (13, 0.0141)
      (7, 0.0141)
      (19, 0.0153)
      (17, 0.0153)
      (20, 0.0153)
      (11, 0.0391)
      (14, 0.0460)
      (15, 0.0541)
      (16, 0.0541)
      (12, 0.1832)

      I would clearly expect that the node 13 would have a much higher PPR value (in fact, I would expect it to be the first one after the starting node itself). The problem appears as well with other nodes involved in cycles, for example for starting node 13 the node 15 has a very low score. From all the testing that I have done it seems that for starting nodes that do not participate in a cycle the result is exactly how I expect.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jackduluoz Francesco Elia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: