Details
Description
Gelly includes PageRank implementations for scatter-gather and gather-sum-apply. Both ship with the warning "The implementation assumes that each page has at least one incoming and one outgoing link."
PageRank is a directed algorithm and sources and sinks are common in directed graphs.
Sinks drain the total score across the graph which affects convergence and the balance of the random hop (convergence is not currently a feature of Gelly's PageRanks as this a very recent feature from FLINK-3888).
Sources are handled nicely by the algorithm highlighted on Flink's features page under "Iterations and Delta Iterations" since score deltas are transmitted and a source's score never changes (is always equal to the random hop probability divided by the vertex count).
https://flink.apache.org/features.html
We should find an implementation featuring convergence and unrestricted processing of directed graphs and move other implementations to Gelly examples.
Issue Links
- links to
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
GitHub user greghogan opened a pull request:
https://github.com/apache/flink/pull/2733
FLINK-4896[gelly] PageRank algorithm for directed graphsYou can merge this pull request into a Git repository by running:
$ git pull https://github.com/greghogan/flink 4896_pagerank_algorithm_for_directed_graphs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2733.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2733
commit bddc6437c3202ee981f0eeeaa684e5ac4274d23a
Author: Greg Hogan <code@greghogan.com>
Date: 2016-10-24T20:14:16Z
FLINK-4896[gelly] PageRank algorithm for directed graphs