Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Graph
    • Labels:
      None

      Description

      We are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin. As Sebastian Schelter already announced, we are atteding Isabel's and Sebastian's class "Large scale data analysis and data mining" and picked an interesting project that we want to implement in Mahout. We are open for any hints and suggestions and would appreciate if you could share your thoughts on our proposal.

      Our goal is to implement a map/reduce algorithm for finding k-trusses in a given graph. A k-truss is a nontrivial, single-component maximal subgraph, such that every edge is contained in at least k-2 triangles in the subgraph. The algorithm was proposed in the IEEE paper J. Cohen 2009: "Graph Twiddling in a MapReduce World" (http://www.csee.usf.edu/~anda/CIS6930-S11/papers/graph-processing-w-mapreduce.pdf) and involves a number of graph algorithms that are to our knowledge currently not present in Mahout:

      Goal: finding K-Trusses

      • relaxation of k-member clique
      • non-trivial, single-component maximal subgraph, s.t.
      • every edge is contained in at least k-2 triangles in the subgraph

      Algorithms to be implemented on top of Mahout / Hadoop:

      simplifyGraph: Edges -> RepresentativeEdges

      • removes Loops (not cycles)
      • aggregate duplicate edges

      augmentGraphWithDegrees: RepresentativeEdges -> AugmentedEdges = (Edge (v, u) , d(v), d(u))

      • augements the edges with degree information for both nodes d(v) = | {E | E = (x,y) a. (x = v o. y = v) }

        |

      enumerateTriangles: AugmentedEdges = (Edge, d(v), d(u)) -> Triangles (v, u, s)

      • finds all triangles in a Graph

      findComponents: RepresentativeEdges -> ZoneAssignments (v, z)

      • finds all components of a graph, each identified as the order number of the lowest-order vertex contained
      • consists of:
        • step 1: find adjacent zones: Edges x Zones -> InterzoneEdges (z, z)
        • step 2: merge adjacent zones into one (the lowest-order neighbouring zone): InterzoneEdges, ZoneAssignments (v, z) -> Pairs (v, z)
      while true do:
        step 1
        if empty set interzone edges break;
        step 2
      done
      

      findKTrusses: Edges, k -> ZoneAssignments (v, z)

      • finds all k-trusses of the graph
      • each returned vertex v is part of a truss z
      simplifyGraph
      while true do:
        augmentGraphWithDegrees
        enumerateTriangles
        keep only edges contained in k-2 triangles
        if all edges kept break;
      done
      findComponents
      

      We suppose to create the package org.apache.mahout.graph.trusses and org.apache.mahout.graph.components in the core module.

        Activity

        Hide
        Tillmann Fiehn added a comment -

        Hi Mahouters!

        We worked on the features requested and are proud to present a first unit.

        We implemented

        • SimplifyGraph which can process a variaty of graphs
        • AugmentGraphWithDegrees which augments a graph with degrees to increase stability for future processings
        • EnumerateTriangles which enumerates the triangles of a graph

        This three jobs and a set of tests to proof functionality and usage a available from:
        https://TillmannFiehn@github.com/TillmannFiehn/ktrusses.git
        In there is a mvn project that builds against mahout-0.5.

        Show
        Tillmann Fiehn added a comment - Hi Mahouters! We worked on the features requested and are proud to present a first unit. We implemented SimplifyGraph which can process a variaty of graphs AugmentGraphWithDegrees which augments a graph with degrees to increase stability for future processings EnumerateTriangles which enumerates the triangles of a graph This three jobs and a set of tests to proof functionality and usage a available from: https://TillmannFiehn@github.com/TillmannFiehn/ktrusses.git In there is a mvn project that builds against mahout-0.5.
        Hide
        Tillmann Fiehn added a comment -

        Feature attachment of graph algorithms:
        In preparation for k-trusses we implemented
        SimplifyGraph
        AugmentGraphWithDegrees
        EnumerateTriangles

        Show
        Tillmann Fiehn added a comment - Feature attachment of graph algorithms: In preparation for k-trusses we implemented SimplifyGraph AugmentGraphWithDegrees EnumerateTriangles
        Hide
        Sebastian Schelter added a comment -

        Hello Tillmann and Sebastian,

        I skimmed through your code and it looks great, nice job.

        There are some minor issues with dependencies (stratosphere is not publicly available as maven artifact e.g.) and code conventions, but those can be addressed with little effort. After that is done, I'm looking forward to a SVN conform patch that can be integrated!

        Show
        Sebastian Schelter added a comment - Hello Tillmann and Sebastian, I skimmed through your code and it looks great, nice job. There are some minor issues with dependencies (stratosphere is not publicly available as maven artifact e.g.) and code conventions, but those can be addressed with little effort. After that is done, I'm looking forward to a SVN conform patch that can be integrated!
        Hide
        Tillmann Fiehn added a comment -

        Hi all,
        this is the requested SVN conform patch.
        I appretiate any comments and corrections.
        Greetings Tillmann

        Show
        Tillmann Fiehn added a comment - Hi all, this is the requested SVN conform patch. I appretiate any comments and corrections. Greetings Tillmann
        Hide
        Sean Owen added a comment -

        The code looks tidy and well-thought-out.
        I don't know this algorithm and would love to hear about real-world use cases for it. (Maybe you can announce this on the user@ list with some comments on what it is used for.
        It seems interesting enough to have such an algorithm that I would say go ahead.

        Show
        Sean Owen added a comment - The code looks tidy and well-thought-out. I don't know this algorithm and would love to hear about real-world use cases for it. (Maybe you can announce this on the user@ list with some comments on what it is used for. It seems interesting enough to have such an algorithm that I would say go ahead.
        Hide
        Hector Yee added a comment -

        I'd love to beta test it soon. How will Enumerate Triangle perform on a graph of tens of millions of nodes?

        Show
        Hector Yee added a comment - I'd love to beta test it soon. How will Enumerate Triangle perform on a graph of tens of millions of nodes?
        Hide
        Sebastian Schelter added a comment -

        @Tillmann and Sebastian

        I committed your patch with some major modifications. While your generic writable approach was very elegant, I removed them for more simple writables and more expressive signatures. I tried to streamline and polish a lot of things as well as refactor them to use Mahout-ish conventions.

        Yet I did not change any piece of the algorithmic basis and I want to again thank you for that awesome piece of work of yours! Kudos in the name of the Mahout community!

        @Sean

        In short words, enumerating triangles (three interconnected nodes) in a graph is used as a preprocessing step for finding strongly connected "communities" in a graph. Theoretically you want to find "maximum cliques" in a graph but this is NP-hard and real cliques are rare in reality. The final goal of this issue is to create an algorithm to find k-trusses, which are a relaxation of those cliques. Maybe Tillmann and Sebastian want to write a little more about the usecase on the mailinglist, I'll ask them.

        @Hector

        Would be great to see you beta-test this code on large graphs. The running time should be very dependent on the degree distribution in the graph as the number of "open triads" (possible triangles with one missing edge) to consider per vertex is the square of its degree.

        Show
        Sebastian Schelter added a comment - @Tillmann and Sebastian I committed your patch with some major modifications. While your generic writable approach was very elegant, I removed them for more simple writables and more expressive signatures. I tried to streamline and polish a lot of things as well as refactor them to use Mahout-ish conventions. Yet I did not change any piece of the algorithmic basis and I want to again thank you for that awesome piece of work of yours! Kudos in the name of the Mahout community! @Sean In short words, enumerating triangles (three interconnected nodes) in a graph is used as a preprocessing step for finding strongly connected "communities" in a graph. Theoretically you want to find "maximum cliques" in a graph but this is NP-hard and real cliques are rare in reality. The final goal of this issue is to create an algorithm to find k-trusses, which are a relaxation of those cliques. Maybe Tillmann and Sebastian want to write a little more about the usecase on the mailinglist, I'll ask them. @Hector Would be great to see you beta-test this code on large graphs. The running time should be very dependent on the degree distribution in the graph as the number of "open triads" (possible triangles with one missing edge) to consider per vertex is the square of its degree.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #892 (See https://builds.apache.org/job/Mahout-Quality/892/)
        MAHOUT-710 graph primitives and triangle enumeration code

        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1137447
        Files :

        • /mahout/trunk/core
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/VertexWithDegree.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdgeWithDegrees.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/package-info.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/SimplifyGraphJob.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/package-info.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/package-info.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/package-info.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/TasteTestCase.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/SimplifyGraphJobTest.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Triangle.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Vertex.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJobTest.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/JoinableUndirectedEdge.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/common/MahoutTestCase.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/VertexOrMarker.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdge.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph
        Show
        Hudson added a comment - Integrated in Mahout-Quality #892 (See https://builds.apache.org/job/Mahout-Quality/892/ ) MAHOUT-710 graph primitives and triangle enumeration code ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1137447 Files : /mahout/trunk/core /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/VertexWithDegree.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdgeWithDegrees.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/package-info.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/SimplifyGraphJob.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/package-info.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/package-info.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/package-info.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/TasteTestCase.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/SimplifyGraphJobTest.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Triangle.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Vertex.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJobTest.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/JoinableUndirectedEdge.java /mahout/trunk/core/src/test/java/org/apache/mahout/common/MahoutTestCase.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/VertexOrMarker.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdge.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common /mahout/trunk/core/src/main/java/org/apache/mahout/graph
        Hide
        Sebastian Schelter added a comment -

        That one's not done yet, the committed patch only contains the preprocessing steps (which I committed because they are themselves already very useful).

        Should we reopen this or edit it and create a new issue for the remaining functionality?

        Show
        Sebastian Schelter added a comment - That one's not done yet, the committed patch only contains the preprocessing steps (which I committed because they are themselves already very useful). Should we reopen this or edit it and create a new issue for the remaining functionality?
        Hide
        Tillmann Fiehn added a comment -

        I think, we should reopen it because it sums up documentation for all the functionality.
        However: I am working on the issue and can submit another patch soon.

        Show
        Tillmann Fiehn added a comment - I think, we should reopen it because it sums up documentation for all the functionality. However: I am working on the issue and can submit another patch soon.
        Hide
        Tillmann Fiehn added a comment - - edited

        Giving implementations for graph algorithms:
        find components in a graph
        find k-trusses in a graph

        Featuring:
        javadoc
        toy integration tests with multiple recursion runs

        I am happy to receive further ideas on improvements.

        Show
        Tillmann Fiehn added a comment - - edited Giving implementations for graph algorithms: find components in a graph find k-trusses in a graph Featuring: javadoc toy integration tests with multiple recursion runs I am happy to receive further ideas on improvements.
        Hide
        Tillmann Fiehn added a comment -

        Thank you very much for the refactorings and support on the first patch.
        Here is another one with the pending algorithms.

        Show
        Tillmann Fiehn added a comment - Thank you very much for the refactorings and support on the first patch. Here is another one with the pending algorithms.
        Hide
        Sean Owen added a comment -

        Is the second patch OK to commit Sebastian? I would assume so.
        I am unable to apply it as it's gone out of sync with HEAD. Tillmann can you try to reintegrate it?

        Show
        Sean Owen added a comment - Is the second patch OK to commit Sebastian? I would assume so. I am unable to apply it as it's gone out of sync with HEAD. Tillmann can you try to reintegrate it?
        Hide
        Sebastian Schelter added a comment -

        I'm a bit undecided on this one.

        We implemented triangle enumeration as a preprocessing step like it is described in the article that Tillmann referenced but when we tried it on medium sized graphs (100M edges) the performance was horrible...

        It would be tragic to let the work Tillmann and Sebastian A. put in here go to waste, but the algorithm as it is described in the article is clearly not suited for production systems...

        Show
        Sebastian Schelter added a comment - I'm a bit undecided on this one. We implemented triangle enumeration as a preprocessing step like it is described in the article that Tillmann referenced but when we tried it on medium sized graphs (100M edges) the performance was horrible... It would be tragic to let the work Tillmann and Sebastian A. put in here go to waste, but the algorithm as it is described in the article is clearly not suited for production systems...
        Hide
        Sean Owen added a comment -

        Well some good code was already submitted as part of this. An experiment is worthwhile even if it merely shows something doesn't work. The code and issue remain here in JIRA. I assume everyone learned a bit as a result. I'd say close this if there's no indication that there's a submit coming in the near future. Of course, this can always be found and reopened by the contributors or anyone that wants to revisit it.

        Show
        Sean Owen added a comment - Well some good code was already submitted as part of this. An experiment is worthwhile even if it merely shows something doesn't work. The code and issue remain here in JIRA. I assume everyone learned a bit as a result. I'd say close this if there's no indication that there's a submit coming in the near future. Of course, this can always be found and reopened by the contributors or anyone that wants to revisit it.
        Hide
        Sebastian Schelter added a comment -

        Unfortunately it is already that commited code that (to my impression) does not fit the MapReduce paradigm and should be removed...

        Show
        Sebastian Schelter added a comment - Unfortunately it is already that commited code that (to my impression) does not fit the MapReduce paradigm and should be removed...

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Tillmann Fiehn
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 672h
              672h
              Remaining:
              Remaining Estimate - 672h
              672h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development