# Implementing K-Trusses

## Details

• Type: New Feature
• Status: Closed
• Priority: Major
• Resolution: Fixed
• Affects Version/s: 0.5
• Fix Version/s:
• Component/s: None
• Labels:
None

## Description

We are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin. As Sebastian Schelter already announced, we are atteding Isabel's and Sebastian's class "Large scale data analysis and data mining" and picked an interesting project that we want to implement in Mahout. We are open for any hints and suggestions and would appreciate if you could share your thoughts on our proposal.

Our goal is to implement a map/reduce algorithm for finding k-trusses in a given graph. A k-truss is a nontrivial, single-component maximal subgraph, such that every edge is contained in at least k-2 triangles in the subgraph. The algorithm was proposed in the IEEE paper J. Cohen 2009: "Graph Twiddling in a MapReduce World" (http://www.csee.usf.edu/~anda/CIS6930-S11/papers/graph-processing-w-mapreduce.pdf) and involves a number of graph algorithms that are to our knowledge currently not present in Mahout:

Goal: finding K-Trusses

• relaxation of k-member clique
• non-trivial, single-component maximal subgraph, s.t.
• every edge is contained in at least k-2 triangles in the subgraph

Algorithms to be implemented on top of Mahout / Hadoop:

simplifyGraph: Edges -> RepresentativeEdges

• removes Loops (not cycles)
• aggregate duplicate edges

augmentGraphWithDegrees: RepresentativeEdges -> AugmentedEdges = (Edge (v, u) , d(v), d(u))

• augements the edges with degree information for both nodes d(v) = | {E | E = (x,y) a. (x = v o. y = v) }

|

enumerateTriangles: AugmentedEdges = (Edge, d(v), d(u)) -> Triangles (v, u, s)

• finds all triangles in a Graph

findComponents: RepresentativeEdges -> ZoneAssignments (v, z)

• finds all components of a graph, each identified as the order number of the lowest-order vertex contained
• consists of:
• step 1: find adjacent zones: Edges x Zones -> InterzoneEdges (z, z)
• step 2: merge adjacent zones into one (the lowest-order neighbouring zone): InterzoneEdges, ZoneAssignments (v, z) -> Pairs (v, z)
```while true do:
step 1
if empty set interzone edges break;
step 2
done
```

findKTrusses: Edges, k -> ZoneAssignments (v, z)

• finds all k-trusses of the graph
• each returned vertex v is part of a truss z
```simplifyGraph
while true do:
augmentGraphWithDegrees
enumerateTriangles
keep only edges contained in k-2 triangles
if all edges kept break;
done
findComponents
```

We suppose to create the package org.apache.mahout.graph.trusses and org.apache.mahout.graph.components in the core module.

## Activity

Hide
Tillmann Fiehn added a comment -

Hi Mahouters!

We worked on the features requested and are proud to present a first unit.

We implemented

• SimplifyGraph which can process a variaty of graphs
• AugmentGraphWithDegrees which augments a graph with degrees to increase stability for future processings
• EnumerateTriangles which enumerates the triangles of a graph

This three jobs and a set of tests to proof functionality and usage a available from:
https://TillmannFiehn@github.com/TillmannFiehn/ktrusses.git
In there is a mvn project that builds against mahout-0.5.

Show
Tillmann Fiehn added a comment - Hi Mahouters! We worked on the features requested and are proud to present a first unit. We implemented SimplifyGraph which can process a variaty of graphs AugmentGraphWithDegrees which augments a graph with degrees to increase stability for future processings EnumerateTriangles which enumerates the triangles of a graph This three jobs and a set of tests to proof functionality and usage a available from: https://TillmannFiehn@github.com/TillmannFiehn/ktrusses.git In there is a mvn project that builds against mahout-0.5.
Hide
Tillmann Fiehn added a comment -

Feature attachment of graph algorithms:
In preparation for k-trusses we implemented
SimplifyGraph
AugmentGraphWithDegrees
EnumerateTriangles

Show
Tillmann Fiehn added a comment - Feature attachment of graph algorithms: In preparation for k-trusses we implemented SimplifyGraph AugmentGraphWithDegrees EnumerateTriangles
Hide
Sebastian Schelter added a comment -

Hello Tillmann and Sebastian,

I skimmed through your code and it looks great, nice job.

There are some minor issues with dependencies (stratosphere is not publicly available as maven artifact e.g.) and code conventions, but those can be addressed with little effort. After that is done, I'm looking forward to a SVN conform patch that can be integrated!

Show
Sebastian Schelter added a comment - Hello Tillmann and Sebastian, I skimmed through your code and it looks great, nice job. There are some minor issues with dependencies (stratosphere is not publicly available as maven artifact e.g.) and code conventions, but those can be addressed with little effort. After that is done, I'm looking forward to a SVN conform patch that can be integrated!
Hide
Tillmann Fiehn added a comment -

Hi all,
this is the requested SVN conform patch.
I appretiate any comments and corrections.
Greetings Tillmann

Show
Tillmann Fiehn added a comment - Hi all, this is the requested SVN conform patch. I appretiate any comments and corrections. Greetings Tillmann
Hide
Sean Owen added a comment -

The code looks tidy and well-thought-out.
I don't know this algorithm and would love to hear about real-world use cases for it. (Maybe you can announce this on the user@ list with some comments on what it is used for.
It seems interesting enough to have such an algorithm that I would say go ahead.

Show
Sean Owen added a comment - The code looks tidy and well-thought-out. I don't know this algorithm and would love to hear about real-world use cases for it. (Maybe you can announce this on the user@ list with some comments on what it is used for. It seems interesting enough to have such an algorithm that I would say go ahead.
Hide
Hector Yee added a comment -

I'd love to beta test it soon. How will Enumerate Triangle perform on a graph of tens of millions of nodes?

Show
Hector Yee added a comment - I'd love to beta test it soon. How will Enumerate Triangle perform on a graph of tens of millions of nodes?
Hide
Sebastian Schelter added a comment -

@Tillmann and Sebastian

I committed your patch with some major modifications. While your generic writable approach was very elegant, I removed them for more simple writables and more expressive signatures. I tried to streamline and polish a lot of things as well as refactor them to use Mahout-ish conventions.

Yet I did not change any piece of the algorithmic basis and I want to again thank you for that awesome piece of work of yours! Kudos in the name of the Mahout community!

@Sean

In short words, enumerating triangles (three interconnected nodes) in a graph is used as a preprocessing step for finding strongly connected "communities" in a graph. Theoretically you want to find "maximum cliques" in a graph but this is NP-hard and real cliques are rare in reality. The final goal of this issue is to create an algorithm to find k-trusses, which are a relaxation of those cliques. Maybe Tillmann and Sebastian want to write a little more about the usecase on the mailinglist, I'll ask them.

@Hector

Would be great to see you beta-test this code on large graphs. The running time should be very dependent on the degree distribution in the graph as the number of "open triads" (possible triangles with one missing edge) to consider per vertex is the square of its degree.

Show
Sebastian Schelter added a comment - @Tillmann and Sebastian I committed your patch with some major modifications. While your generic writable approach was very elegant, I removed them for more simple writables and more expressive signatures. I tried to streamline and polish a lot of things as well as refactor them to use Mahout-ish conventions. Yet I did not change any piece of the algorithmic basis and I want to again thank you for that awesome piece of work of yours! Kudos in the name of the Mahout community! @Sean In short words, enumerating triangles (three interconnected nodes) in a graph is used as a preprocessing step for finding strongly connected "communities" in a graph. Theoretically you want to find "maximum cliques" in a graph but this is NP-hard and real cliques are rare in reality. The final goal of this issue is to create an algorithm to find k-trusses, which are a relaxation of those cliques. Maybe Tillmann and Sebastian want to write a little more about the usecase on the mailinglist, I'll ask them. @Hector Would be great to see you beta-test this code on large graphs. The running time should be very dependent on the degree distribution in the graph as the number of "open triads" (possible triangles with one missing edge) to consider per vertex is the square of its degree.
Hide

Integrated in Mahout-Quality #892 (See https://builds.apache.org/job/Mahout-Quality/892/)
MAHOUT-710 graph primitives and triangle enumeration code

• /mahout/trunk/core
• /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/VertexWithDegree.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJob.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdgeWithDegrees.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/package-info.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/SimplifyGraphJob.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/graph
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/package-info.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/package-info.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/package-info.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJobTest.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/TasteTestCase.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/SimplifyGraphJobTest.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJob.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Triangle.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Vertex.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJobTest.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/JoinableUndirectedEdge.java
• /mahout/trunk/core/src/test/java/org/apache/mahout/common/MahoutTestCase.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/VertexOrMarker.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdge.java
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common
• /mahout/trunk/core/src/main/java/org/apache/mahout/graph
Show
Hudson added a comment - Integrated in Mahout-Quality #892 (See https://builds.apache.org/job/Mahout-Quality/892/ ) MAHOUT-710 graph primitives and triangle enumeration code ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1137447 Files : /mahout/trunk/core /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/VertexWithDegree.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdgeWithDegrees.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/package-info.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/SimplifyGraphJob.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/package-info.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common/package-info.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/package-info.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/AugmentGraphWithDegreesJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/TasteTestCase.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common/SimplifyGraphJobTest.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Triangle.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/common /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/Vertex.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/triangles/EnumerateTrianglesJobTest.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/JoinableUndirectedEdge.java /mahout/trunk/core/src/test/java/org/apache/mahout/common/MahoutTestCase.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles/VertexOrMarker.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/triangles /mahout/trunk/core/src/main/java/org/apache/mahout/graph/model/UndirectedEdge.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/common /mahout/trunk/core/src/main/java/org/apache/mahout/graph
Hide
Sebastian Schelter added a comment -

That one's not done yet, the committed patch only contains the preprocessing steps (which I committed because they are themselves already very useful).

Should we reopen this or edit it and create a new issue for the remaining functionality?

Show
Sebastian Schelter added a comment - That one's not done yet, the committed patch only contains the preprocessing steps (which I committed because they are themselves already very useful). Should we reopen this or edit it and create a new issue for the remaining functionality?
Hide
Tillmann Fiehn added a comment -

I think, we should reopen it because it sums up documentation for all the functionality.
However: I am working on the issue and can submit another patch soon.

Show
Tillmann Fiehn added a comment - I think, we should reopen it because it sums up documentation for all the functionality. However: I am working on the issue and can submit another patch soon.
Hide
Tillmann Fiehn added a comment - - edited

Giving implementations for graph algorithms:
find components in a graph
find k-trusses in a graph

Featuring:
toy integration tests with multiple recursion runs

I am happy to receive further ideas on improvements.

Show
Tillmann Fiehn added a comment - - edited Giving implementations for graph algorithms: find components in a graph find k-trusses in a graph Featuring: javadoc toy integration tests with multiple recursion runs I am happy to receive further ideas on improvements.
Hide
Tillmann Fiehn added a comment -

Thank you very much for the refactorings and support on the first patch.
Here is another one with the pending algorithms.

Show
Tillmann Fiehn added a comment - Thank you very much for the refactorings and support on the first patch. Here is another one with the pending algorithms.
Hide
Sean Owen added a comment -

Is the second patch OK to commit Sebastian? I would assume so.
I am unable to apply it as it's gone out of sync with HEAD. Tillmann can you try to reintegrate it?

Show
Sean Owen added a comment - Is the second patch OK to commit Sebastian? I would assume so. I am unable to apply it as it's gone out of sync with HEAD. Tillmann can you try to reintegrate it?
Hide
Sebastian Schelter added a comment -

I'm a bit undecided on this one.

We implemented triangle enumeration as a preprocessing step like it is described in the article that Tillmann referenced but when we tried it on medium sized graphs (100M edges) the performance was horrible...

It would be tragic to let the work Tillmann and Sebastian A. put in here go to waste, but the algorithm as it is described in the article is clearly not suited for production systems...

Show
Sebastian Schelter added a comment - I'm a bit undecided on this one. We implemented triangle enumeration as a preprocessing step like it is described in the article that Tillmann referenced but when we tried it on medium sized graphs (100M edges) the performance was horrible... It would be tragic to let the work Tillmann and Sebastian A. put in here go to waste, but the algorithm as it is described in the article is clearly not suited for production systems...
Hide
Sean Owen added a comment -

Well some good code was already submitted as part of this. An experiment is worthwhile even if it merely shows something doesn't work. The code and issue remain here in JIRA. I assume everyone learned a bit as a result. I'd say close this if there's no indication that there's a submit coming in the near future. Of course, this can always be found and reopened by the contributors or anyone that wants to revisit it.

Show
Sean Owen added a comment - Well some good code was already submitted as part of this. An experiment is worthwhile even if it merely shows something doesn't work. The code and issue remain here in JIRA. I assume everyone learned a bit as a result. I'd say close this if there's no indication that there's a submit coming in the near future. Of course, this can always be found and reopened by the contributors or anyone that wants to revisit it.
Hide
Sebastian Schelter added a comment -

Unfortunately it is already that commited code that (to my impression) does not fit the MapReduce paradigm and should be removed...

Show
Sebastian Schelter added a comment - Unfortunately it is already that commited code that (to my impression) does not fit the MapReduce paradigm and should be removed...

## People

• Assignee:
Sebastian Schelter
Reporter:
Tillmann Fiehn
1 Vote for this issue
Watchers:
1 Start watching this issue

## Dates

• Created:
Updated:
Resolved:

## Time Tracking

Estimated:
672h
Remaining:
672h
Logged:
Not Specified