Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35357

Allow to turn off the normalization applied by static PageRank utilities

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.1
    • 3.2.0
    • GraphX
    • None

    Description

      Since SPARK-18847, static PageRank computations available in `PageRank.scala` are normalizing the sum of the ranks after the fixed number of iterations has completed, and there is no way for a developer to access the raw non normalized ranks values.

      Since SPARK-29877 one can run a fixed number of PageRank iterations starting from previous `preRankGraph`'s ranks.
      This nice feature open the door for interesting incremental algorithms, for example:
      "Run some initial pagerank iterations using `PageRank.runWithOptions` and then update the graph's edges and update the ranks with a call to `PageRank.runWithOptionsWithPreviousPageRank`, and so on...".

      This kind of algorithms would highly benefit (precision gain) from being allowed to manipulate directly the raw ranks (and not the normalized ones) in the case where the graph has a substantial proportion of sinks (vertices without outgoing edges).

      It would be nice to add a method's signature having a boolean that allows to turn off the automatic normalization run at the end of `PageRank.runWithOptions` and `PageRank.runWithOptionsWithPreviousPageRank`, making the developers free to apply the normalization only when they really need it.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            EnzoBnl bonnal-enzo
            EnzoBnl bonnal-enzo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment