Type: New Feature
Affects Version/s: 0.6
Fix Version/s: 0.6
my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.
My work is to implement the pagerank-algorithm, where the pagerank-vector fits in memory.
For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
Matrix- and vector-multiplication are done with mahout methods.
Most work is the transformation the input graph, which has to consists of a nodes- and edges file.
Format of nodes file: <node>\n
Format of edges file: <startNode>\t<endNode>\n
Therefore I created the following classes:
- LineIndexer: assigns each line an index
- EdgesToIndex: indexes the nodes of the edges
- EdgesIndexToTransitionMatrix: creates the transition matrix
- Pagerank: computes PR from transition matrix
- JoinNodesWithPagerank: creates the joined output
- PagerankExampleJob: does the complete job
Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.