Details

Type: New Feature

Status: Closed

Priority: Major

Resolution: Fixed

Affects Version/s: 0.6

Fix Version/s: 0.6

Component/s: None

Labels:None
Description
Hi,
my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.
My work is to implement the pagerankalgorithm, where the pagerankvector fits in memory.
For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://wwwscf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
Matrix and vectormultiplication are done with mahout methods.
Most work is the transformation the input graph, which has to consists of a nodes and edges file.
Format of nodes file: <node>\n
Format of edges file: <startNode>\t<endNode>\n
Therefore I created the following classes:
 LineIndexer: assigns each line an index
 EdgesToIndex: indexes the nodes of the edges
 EdgesIndexToTransitionMatrix: creates the transition matrix
 Pagerank: computes PR from transition matrix
 JoinNodesWithPagerank: creates the joined output
 PagerankExampleJob: does the complete job
Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.
My bad, didn't know that Mahout org.apache.mahout.math.Matrix and her friends were so fullfeatured. Thanks. Then it shouldn't be any problem.
Actually I had come across #
MAHOUT879(Remove all graph algorithms with the exception of PageRank) and was just checking with you if largescale sparse matvec mult and PageRank implementations in MapReduce are welcome.