Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1783

PageRank gives incorrect results for graphs with sinks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.3.0, 3.1.8, 3.2.6
    • 3.3.1
    • process

    Description

      Sink vertices (those with no outgoing edges) should evenly distribute their rank to the entire graph but in the current implementation it is just lost.

      Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm

      In the original form of PageRank, the sum of PageRank over all pages was the total number of pages on the web at that time

      I found the issue, while comparing results with the spark graphX.
      So this is a copy of https://issues.apache.org/jira/browse/SPARK-18847

      How to reproduce:

      gremlin> graph = TinkerFactory.createModern()
      gremlin> g = graph.traversal().withComputer()
      gremlin> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum()
      ==>1.318625
      gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum()
      ==>3.4499999999999997
      #inital values:
      gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum()
      ==>6.0
      

      They fixed the issue by normalising values after each step.
      The other way to fix is to send the message to it self (stay on the same page).
      To workaround the problem just add self pointing edges:

      gremlin>g.V().as('B').addE('knows').from('B')
      

      Then you'll get always correct sum. But I'm not sure it is a proper assumption.

      Attachments

        Issue Links

          Activity

            People

              okram Marko A. Rodriguez
              artem.aliev Artem Aliev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: