Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20429

[GRAPHX] Strange results for personalized pagerank if node is involved in a cycle

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Component/s: GraphX
    • Labels:

      Description

      I'm trying to run the personalized PageRank implementation of GraphX on a simple test graph, which is the following:

      Image: https://i.stack.imgur.com/JDv1l.jpg

      I'm a bit confused on some results that I get when I try to compute the PPR for a node that is involved in a cycle. For example, the final output for the node 12 is as follows:

      (13, 0.0141)
      (7, 0.0141)
      (19, 0.0153)
      (17, 0.0153)
      (20, 0.0153)
      (11, 0.0391)
      (14, 0.0460)
      (15, 0.0541)
      (16, 0.0541)
      (12, 0.1832)

      I would clearly expect that the node 13 would have a much higher PPR value (in fact, I would expect it to be the first one after the starting node itself). The problem appears as well with other nodes involved in cycles, for example for starting node 13 the node 15 has a very low score. From all the testing that I have done it seems that for starting nodes that do not participate in a cycle the result is exactly how I expect.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jackduluoz Francesco Elia
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: