Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18845

PageRank has incorrect initialization value that leads to slow convergence

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.0.2
    • Fix Version/s: 2.2.0
    • Component/s: GraphX
    • Labels:
      None

      Description

      All variants of PageRank in GraphX have incorrect initialization value that leads to slow convergence. In the current implementations ranks are seeded with the reset probability when it should be 1. This appears to have been introduced a long time ago in https://github.com/apache/spark/commit/15a564598fe63003652b1e24527c432080b5976c#diff-b2bf3f97dcd2f19d61c921836159cda9L90

      This also hides the fact that source vertices (vertices with no incoming edges) are not updated. This is because source vertices generally* have pagerank equal to the reset probability. Therefore both need to be fixed at once.

      PR will be added shortly

      *when there are no sinks – but that's a separate bug

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                a1ray Andrew Ray
                Reporter:
                a1ray Andrew Ray
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: