[MAHOUT-742] Pagerank implementation in Map/Reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.6
Fix Version/s: 0.6
Component/s: None
Labels:
None

Description

Hi,

my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.
My work is to implement the pagerank-algorithm, where the pagerank-vector fits in memory.
For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
Matrix- and vector-multiplication are done with mahout methods.

Most work is the transformation the input graph, which has to consists of a nodes- and edges file.
Format of nodes file: <node>\n
Format of edges file: <startNode>\t<endNode>\n

Therefore I created the following classes:

LineIndexer: assigns each line an index
EdgesToIndex: indexes the nodes of the edges
EdgesIndexToTransitionMatrix: creates the transition matrix
Pagerank: computes PR from transition matrix
JoinNodesWithPagerank: creates the joined output
PagerankExampleJob: does the complete job

Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAHOUT-742.patch
22/Jun/11 16:36
53 kB
Christoph Nagel

Activity

People

Assignee:: Sebastian Schelter

Reporter:: Christoph Nagel

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Jun/11 16:18

Updated:: 31/Mar/15 22:48

Resolved:: 11/Jul/11 15:49