Details

Type: New Feature

Status: Closed

Priority: Major

Resolution: Fixed

Affects Version/s: 0.6

Fix Version/s: 0.6

Component/s: None

Labels:None
Description
Hi,
my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.
My work is to implement the pagerankalgorithm, where the pagerankvector fits in memory.
For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://wwwscf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
Matrix and vectormultiplication are done with mahout methods.
Most work is the transformation the input graph, which has to consists of a nodes and edges file.
Format of nodes file: <node>\n
Format of edges file: <startNode>\t<endNode>\n
Therefore I created the following classes:
 LineIndexer: assigns each line an index
 EdgesToIndex: indexes the nodes of the edges
 EdgesIndexToTransitionMatrix: creates the transition matrix
 Pagerank: computes PR from transition matrix
 JoinNodesWithPagerank: creates the joined output
 PagerankExampleJob: does the complete job
Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.
Thank you very much for your great work, Christoph!
Unfortunately we have to clarify whether we are legally allowed to include a pagerank implementation before we can commit your contribution...