Description
I'm trying to run the personalized PageRank implementation of GraphX on a simple test graph, which is the following:
Image: https://i.stack.imgur.com/JDv1l.jpg
I'm a bit confused on some results that I get when I try to compute the PPR for a node that is involved in a cycle. For example, the final output for the node 12 is as follows:
(13, 0.0141)
(7, 0.0141)
(19, 0.0153)
(17, 0.0153)
(20, 0.0153)
(11, 0.0391)
(14, 0.0460)
(15, 0.0541)
(16, 0.0541)
(12, 0.1832)
I would clearly expect that the node 13 would have a much higher PPR value (in fact, I would expect it to be the first one after the starting node itself). The problem appears as well with other nodes involved in cycles, for example for starting node 13 the node 15 has a very low score. From all the testing that I have done it seems that for starting nodes that do not participate in a cycle the result is exactly how I expect.
Attachments
Issue Links
- duplicates
-
SPARK-18847 PageRank gives incorrect results for graphs with sinks
- Resolved