my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.
My work is to implement the pagerank-algorithm, where the pagerank-vector fits in memory.
For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
Matrix- and vector-multiplication are done with mahout methods.
Most work is the transformation the input graph, which has to consists of a nodes- and edges file.
Format of nodes file: <node>\n
Format of edges file: <startNode>\t<endNode>\n
Therefore I created the following classes:
- LineIndexer: assigns each line an index
- EdgesToIndex: indexes the nodes of the edges
- EdgesIndexToTransitionMatrix: creates the transition matrix
- Pagerank: computes PR from transition matrix
- JoinNodesWithPagerank: creates the joined output
- PagerankExampleJob: does the complete job
Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Fix Version/s||0.6 [ 12316364 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Assignee||Sebastian Schelter [ ssc ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|