Hi,

my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.

My work is to implement the pagerank-algorithm, where the pagerank-vector fits in memory.

For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).

Matrix- and vector-multiplication are done with mahout methods.

Most work is the transformation the input graph, which has to consists of a nodes- and edges file.

Format of nodes file: <node>\n

Format of edges file: <startNode>\t<endNode>\n

Therefore I created the following classes:

- LineIndexer: assigns each line an index
- EdgesToIndex: indexes the nodes of the edges
- EdgesIndexToTransitionMatrix: creates the transition matrix
- Pagerank: computes PR from transition matrix
- JoinNodesWithPagerank: creates the joined output
- PagerankExampleJob: does the complete job

Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.

Thank you very much for your great work, Christoph!

Unfortunately we have to clarify whether we are legally allowed to include a pagerank implementation before we can commit your contribution...