Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-742

Pagerank implementation in Map/Reduce

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      Hi,

      my name is Christoph Nagel. I'm student on technical university Berlin and participating on the course of Isabel Drost and Sebastian Schelter.
      My work is to implement the pagerank-algorithm, where the pagerank-vector fits in memory.
      For the computation I used the naive algorithm shown in the book 'Mining of Massive Datasets' from Rajaraman & Ullman (http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
      Matrix- and vector-multiplication are done with mahout methods.

      Most work is the transformation the input graph, which has to consists of a nodes- and edges file.
      Format of nodes file: <node>\n
      Format of edges file: <startNode>\t<endNode>\n

      Therefore I created the following classes:

      • LineIndexer: assigns each line an index
      • EdgesToIndex: indexes the nodes of the edges
      • EdgesIndexToTransitionMatrix: creates the transition matrix
      • Pagerank: computes PR from transition matrix
      • JoinNodesWithPagerank: creates the joined output
      • PagerankExampleJob: does the complete job

      Each class has a test (not PagerankExampleJob) and I took the example of the book for evaluating.

        Attachments

        1. MAHOUT-742.patch
          53 kB
          Christoph Nagel

          Activity

            People

            • Assignee:
              ssc Sebastian Schelter
              Reporter:
              c.nagel Christoph Nagel
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: