Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: examples
    • Labels:
      None

      Description

      Implementing RWR on Giraph should be a very simple modification of the SimplePageRankVertex code.

      if ( myID == sourceID )
            DoubleWritable vertexValue = new DoubleWritable((0.15f + 0.85f * sum);
      else
            DoubleWritable vertexValue = new DoubleWritable(0.85f * sum);
      

      It would be nice to make it as configurable as possible by using parametric damping factors, preference vectors, strongly preferential, etc...
      More or less along these lines:
      http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/rank/PageRank.html

      1. GIRAPH-191.3.patch
        63 kB
        Gianmarco De Francisci Morales
      2. GIRAPH-191.2.patch
        41 kB
        Gianmarco De Francisci Morales
      3. GIRAPH-191-1.patch
        32 kB
        Sebastian Schelter
      4. GIRAPH-191.patch
        25 kB
        Sebastian Schelter

        Activity

        Gianmarco De Francisci Morales created issue -
        Hide
        Sebastian Schelter added a comment -

        A first draft for the code. It contains an abstract RandomWalkVertex which PageRank, RWR and others can extend.

        Show
        Sebastian Schelter added a comment - A first draft for the code. It contains an abstract RandomWalkVertex which PageRank, RWR and others can extend.
        Sebastian Schelter made changes -
        Field Original Value New Value
        Attachment GIRAPH-191.patch [ 12527942 ]
        Hide
        Paolo Castagna added a comment -

        It would be good if we could also use this or provide a PageRank implementation which deals with dangling nodes/vertexes properly.

        Dangling vertexes are vertexes with no edges.

        SinglePageRankVertex has:

            if (getSuperstep() < MAX_SUPERSTEPS) {
              long edges = getNumOutEdges();
              sendMsgToAllEdges(
                  new DoubleWritable(getVertexValue().get() / edges));
            } else {
              voteToHalt();
            }
        

        This does not work when getNumOutEdges() returns 0.

        Some suggest to divide the PageRank scores of dangling vertexes evenly among all other vertex (it's yet another sort of random jump to propagate PageRank scores to all nodes). This can be implemented in Giraph as a separate superstep using a SumAggregator.

        Discussion on the giraph-user mailing list with further comments and references is here:

        Show
        Paolo Castagna added a comment - It would be good if we could also use this or provide a PageRank implementation which deals with dangling nodes/vertexes properly. Dangling vertexes are vertexes with no edges. SinglePageRankVertex has: if (getSuperstep() < MAX_SUPERSTEPS) { long edges = getNumOutEdges(); sendMsgToAllEdges( new DoubleWritable(getVertexValue().get() / edges)); } else { voteToHalt(); } This does not work when getNumOutEdges() returns 0. Some suggest to divide the PageRank scores of dangling vertexes evenly among all other vertex (it's yet another sort of random jump to propagate PageRank scores to all nodes). This can be implemented in Giraph as a separate superstep using a SumAggregator. Discussion on the giraph-user mailing list with further comments and references is here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201205.mbox/%3C4FB509F4.4040407@googlemail.com%3E
        Sebastian Schelter made changes -
        Summary Random Walk with Restart Random Walks on Graphs
        Affects Version/s 0.2.0 [ 12319640 ]
        Component/s examples [ 12315613 ]
        Sebastian Schelter made changes -
        Attachment GIRAPH-191-1.patch [ 12528037 ]
        Hide
        Gianmarco De Francisci Morales added a comment -

        I tested the patch and it looks like it's working.
        I had to change the config parameter to add .getName() to SOURCE_VERTEX in RandomWalkWithRestartVertex.java:35

            /** Configuration parameter for the source vertex */
            static final String SOURCE_VERTEX =
                    RandomWalkWithRestartVertex.class.getName() + ".sourceVertex";
        

        Otherwise the String reads as "class blah..."

        I compared some toy output with a reference implementation and it looks good!

        I think the next step would be to support weighted graphs.
        The graph can be made stochastic on the fly, at loading time.
        Thoughts?

        I will try to hack some code.

        Show
        Gianmarco De Francisci Morales added a comment - I tested the patch and it looks like it's working. I had to change the config parameter to add .getName() to SOURCE_VERTEX in RandomWalkWithRestartVertex.java:35 /** Configuration parameter for the source vertex */ static final String SOURCE_VERTEX = RandomWalkWithRestartVertex.class.getName() + ".sourceVertex" ; Otherwise the String reads as "class blah..." I compared some toy output with a reference implementation and it looks good! I think the next step would be to support weighted graphs. The graph can be made stochastic on the fly, at loading time. Thoughts? I will try to hack some code.
        Hide
        Sebastian Schelter added a comment -

        I tested PageRank in the test with 200M vertices on 6 machines without
        problems.
        Am 18.05.2012 19:59 schrieb "Gianmarco De Francisci Morales (JIRA)" <

        Show
        Sebastian Schelter added a comment - I tested PageRank in the test with 200M vertices on 6 machines without problems. Am 18.05.2012 19:59 schrieb "Gianmarco De Francisci Morales (JIRA)" <
        Hide
        Sebastian Schelter added a comment -

        I meant the code in the patch, sry.

        Show
        Sebastian Schelter added a comment - I meant the code in the patch, sry.
        Hide
        Gianmarco De Francisci Morales added a comment -

        Attaching a work in progress, so we can synch.

        Added weighted graph support.
        Tested it on a small graph against a reference implementation.
        I didn't manage to test it properly with unit testing. I was unable to run the tests, not sure whether it's munge, mvn or something else's fault.

        Show
        Gianmarco De Francisci Morales added a comment - Attaching a work in progress, so we can synch. Added weighted graph support. Tested it on a small graph against a reference implementation. I didn't manage to test it properly with unit testing. I was unable to run the tests, not sure whether it's munge, mvn or something else's fault.
        Gianmarco De Francisci Morales made changes -
        Attachment GIRAPH-191.2.patch [ 12528488 ]
        Hide
        Gianmarco De Francisci Morales added a comment -

        Added handling of dangling nodes.
        Added the option of specifying the sources for the RWR via a file.
        Tests still need some fixing.

        I think the patch is at a point where it would need some expert eyes.

        Show
        Gianmarco De Francisci Morales added a comment - Added handling of dangling nodes. Added the option of specifying the sources for the RWR via a file. Tests still need some fixing. I think the patch is at a point where it would need some expert eyes.
        Gianmarco De Francisci Morales made changes -
        Attachment PIG-191.1.patch [ 12536635 ]
        Gianmarco De Francisci Morales made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jakob Homan made changes -
        Assignee Gianmarco De Francisci Morales [ azaroth ]
        Hide
        Jakob Homan added a comment -

        This has gone stale (due to recent vertex api changes).

        • Review
          • Rather than creating new Input apparatus, there's nothing we can re-use? If we treat the vertex IDs as Text rather than Longs?
          • There are likely checkstyle violations in terms of line length, once the patch is rebased
          • Are the unit tests not passing? The asserts are commented out.

        Canceling patch post-review and staleification.

        Show
        Jakob Homan added a comment - This has gone stale (due to recent vertex api changes). Review Rather than creating new Input apparatus, there's nothing we can re-use? If we treat the vertex IDs as Text rather than Longs? There are likely checkstyle violations in terms of line length, once the patch is rebased Are the unit tests not passing? The asserts are commented out. Canceling patch post-review and staleification.
        Jakob Homan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Gianmarco De Francisci Morales added a comment -

        Hi Jakob,
        I am trying to get this patch back in sync with trunk but the changes to the API make it more difficult to implement your own memory structures for the vertex.
        I need two layers of wrapping over the primitive types I am using (to save memory).
        I will fix the checkstyle problems and the tests.

        Show
        Gianmarco De Francisci Morales added a comment - Hi Jakob, I am trying to get this patch back in sync with trunk but the changes to the API make it more difficult to implement your own memory structures for the vertex. I need two layers of wrapping over the primitive types I am using (to save memory). I will fix the checkstyle problems and the tests.
        Hide
        Alessandro Presta added a comment -

        Gianmarco: can you elaborate more on the difficulties you're having with vertex implementation?

        Show
        Alessandro Presta added a comment - Gianmarco: can you elaborate more on the difficulties you're having with vertex implementation?
        Hide
        Gianmarco De Francisci Morales added a comment -

        As I said on the dev list, the main concern is creating a lot of objects just for wrapping primitives.
        Anyway, here is a more complete patch, rebased and with workings tests.

        On a side note, pleasing checkstyle makes me go nuts.
        We may want to develop some automated formatting profiles for the most common IDEs to encourage people to contribute.

        Show
        Gianmarco De Francisci Morales added a comment - As I said on the dev list, the main concern is creating a lot of objects just for wrapping primitives. Anyway, here is a more complete patch, rebased and with workings tests. On a side note, pleasing checkstyle makes me go nuts. We may want to develop some automated formatting profiles for the most common IDEs to encourage people to contribute.
        Gianmarco De Francisci Morales made changes -
        Attachment GIRAPH-191.3.patch [ 12540769 ]
        Gianmarco De Francisci Morales made changes -
        Attachment GIRAPH-191.3.patch [ 12540769 ]
        Gianmarco De Francisci Morales made changes -
        Attachment PIG-191.1.patch [ 12536635 ]
        Gianmarco De Francisci Morales made changes -
        Attachment GIRAPH-191.3.patch [ 12540774 ]
        Gianmarco De Francisci Morales made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Gianmarco De Francisci Morales added a comment -

        Tried to upload the patch on RB but even with some text mangling to hide the git/svn differences reviews.apache.org just bounces it back to me with a nice 500 error.
        No clue how to solve it, sorry.

        Show
        Gianmarco De Francisci Morales added a comment - Tried to upload the patch on RB but even with some text mangling to hide the git/svn differences reviews.apache.org just bounces it back to me with a nice 500 error. No clue how to solve it, sorry.
        Hide
        Eugene Koontz added a comment -

        Hi Gianmarco, I have the same issues with RB. I've asked the infra folks to kindly make it possible to submit Git patches:

        https://issues.apache.org/jira/browse/INFRA-5128

        I hope to hear from them soon.

        Show
        Eugene Koontz added a comment - Hi Gianmarco, I have the same issues with RB. I've asked the infra folks to kindly make it possible to submit Git patches: https://issues.apache.org/jira/browse/INFRA-5128 I hope to hear from them soon.
        Hide
        Eugene Koontz added a comment -

        Good idea about the automated formatting profiles - perhaps it could be incorporated into the "mvn idea:idea" target for use by IntelliJ users (like myself).

        Show
        Eugene Koontz added a comment - Good idea about the automated formatting profiles - perhaps it could be incorporated into the "mvn idea:idea" target for use by IntelliJ users (like myself).
        Hide
        Avery Ching added a comment -

        FYI, for IntelliJ users, you can simply make sure that you load our checkstyle file. Then you can highlight the checkstyle errors.

        Show
        Avery Ching added a comment - FYI, for IntelliJ users, you can simply make sure that you load our checkstyle file. Then you can highlight the checkstyle errors.
        Hide
        Gianmarco De Francisci Morales added a comment -

        Yes, it works the same for Eclipse with the checkstyle plugin.
        However, there is no automated way of just pressing a button and getting things straight.
        Instead one has to manually fix every single issue.

        Show
        Gianmarco De Francisci Morales added a comment - Yes, it works the same for Eclipse with the checkstyle plugin. However, there is no automated way of just pressing a button and getting things straight. Instead one has to manually fix every single issue.
        Hide
        Gianmarco De Francisci Morales added a comment -

        Forgot to mention, patch is available in RB now.
        https://reviews.apache.org/r/6653/

        Show
        Gianmarco De Francisci Morales added a comment - Forgot to mention, patch is available in RB now. https://reviews.apache.org/r/6653/
        Hide
        Gianmarco De Francisci Morales added a comment -

        Could someone review the patch before it gets too stale, please?

        Show
        Gianmarco De Francisci Morales added a comment - Could someone review the patch before it gets too stale, please?
        Hide
        Avery Ching added a comment -

        Gianmarco, I'll try and take a look today or tomorrow. Thanks for the patch!

        Show
        Avery Ching added a comment - Gianmarco, I'll try and take a look today or tomorrow. Thanks for the patch!
        Hide
        Eli Reisman added a comment -

        I can try to take a look at this too if Avery is busy, sorry about the delay I have been moving...

        Show
        Eli Reisman added a comment - I can try to take a look at this too if Avery is busy, sorry about the delay I have been moving...
        Hide
        Gianmarco De Francisci Morales added a comment -

        Thanks Eli, that would be great!

        Show
        Gianmarco De Francisci Morales added a comment - Thanks Eli, that would be great!
        Hide
        Eli Reisman added a comment -

        +1, committed. Thanks again.

        Show
        Eli Reisman added a comment - +1, committed. Thanks again.
        Hide
        Avery Ching added a comment -

        Thanks Eli, I got backed up this weekend. Thanks Gianmarco for the work!

        Show
        Avery Ching added a comment - Thanks Eli, I got backed up this weekend. Thanks Gianmarco for the work!
        Hide
        Hudson added a comment -

        Integrated in Giraph-trunk-Commit #192 (See https://builds.apache.org/job/Giraph-trunk-Commit/192/)
        GIRAPH-191: Random Walks On Graphs (Gianmarco De Francisci Morales via ereisman) (Revision 1383115)

        Result = SUCCESS
        ereisman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383115
        Files :

        • /giraph/trunk/CHANGELOG
        • /giraph/trunk/checkstyle.xml
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/LongDoubleFloatDoubleTextInputFormat.java
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/NormalizingLongDoubleFloatDoubleTextInputFormat.java
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/RandomWalkVertex.java
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/RandomWalkWithRestartVertex.java
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/RandomWalkWorkerContext.java
        • /giraph/trunk/src/main/java/org/apache/giraph/examples/VertexWithDoubleValueFloatEdgeTextOutputFormat.java
        • /giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleEdgeListVertex.java
        • /giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleNullDoubleVertex.java
        • /giraph/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java
        • /giraph/trunk/src/main/java/org/apache/giraph/utils/MathUtils.java
        • /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableDoubleArrayIterator.java
        • /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableLongArrayIterator.java
        • /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableLongFloatEdgeArrayIterable.java
        • /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableLongNullEdgeArrayIterable.java
        • /giraph/trunk/src/test/java/org/apache/giraph/examples/RandomWalkWithRestartVertexTest.java
        Show
        Hudson added a comment - Integrated in Giraph-trunk-Commit #192 (See https://builds.apache.org/job/Giraph-trunk-Commit/192/ ) GIRAPH-191 : Random Walks On Graphs (Gianmarco De Francisci Morales via ereisman) (Revision 1383115) Result = SUCCESS ereisman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383115 Files : /giraph/trunk/CHANGELOG /giraph/trunk/checkstyle.xml /giraph/trunk/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java /giraph/trunk/src/main/java/org/apache/giraph/examples/LongDoubleFloatDoubleTextInputFormat.java /giraph/trunk/src/main/java/org/apache/giraph/examples/NormalizingLongDoubleFloatDoubleTextInputFormat.java /giraph/trunk/src/main/java/org/apache/giraph/examples/RandomWalkVertex.java /giraph/trunk/src/main/java/org/apache/giraph/examples/RandomWalkWithRestartVertex.java /giraph/trunk/src/main/java/org/apache/giraph/examples/RandomWalkWorkerContext.java /giraph/trunk/src/main/java/org/apache/giraph/examples/VertexWithDoubleValueFloatEdgeTextOutputFormat.java /giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleEdgeListVertex.java /giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleNullDoubleVertex.java /giraph/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java /giraph/trunk/src/main/java/org/apache/giraph/utils/MathUtils.java /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableDoubleArrayIterator.java /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableLongArrayIterator.java /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableLongFloatEdgeArrayIterable.java /giraph/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableLongNullEdgeArrayIterable.java /giraph/trunk/src/test/java/org/apache/giraph/examples/RandomWalkWithRestartVertexTest.java
        Avery Ching made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Gianmarco De Francisci Morales
            Reporter:
            Gianmarco De Francisci Morales
          • Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development