Mahout
  1. Mahout
  2. MAHOUT-863

Add DisplayMinhash clustering example

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None

      Description

      We've got simple GUI tools for many of the clustering algorithms, we should add one for Minhash, too

      1. MAHOUT-863.patch
        12 kB
        Miroslav Pankov
      2. MAHOUT-863.patch
        12 kB
        Miroslav Pankov
      3. MAHOUT-863.patch
        6 kB
        Grant Ingersoll

        Activity

        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1304 (See https://builds.apache.org/job/Mahout-Quality/1304/)
        MAHOUT-863: add display for min hash clustering

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1231065
        Files :

        • /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayMinHash.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1304 (See https://builds.apache.org/job/Mahout-Quality/1304/ ) MAHOUT-863 : add display for min hash clustering gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1231065 Files : /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayMinHash.java
        Hide
        Grant Ingersoll added a comment -

        Committing, as this is straightforward and helps some students out.

        Show
        Grant Ingersoll added a comment - Committing, as this is straightforward and helps some students out.
        Hide
        Lance Norskog added a comment -

        Just tested this- works great. Thanks for writing it. Maven needs a little more memory than default, but not much.

        cd examples
        MAVEN_OPTS=-Xmx200m
        mvn -q exec:java -Dexec.mainass=org.apache.mahout.clustering.display.DisplayMinHash

        Show
        Lance Norskog added a comment - Just tested this- works great. Thanks for writing it. Maven needs a little more memory than default, but not much. cd examples MAVEN_OPTS=-Xmx200m mvn -q exec:java -Dexec.mainass=org.apache.mahout.clustering.display.DisplayMinHash
        Hide
        Miroslav Pankov added a comment -

        Grant,

        Our project defense date is 16/Jan/12 (this Monday). Can you give us update on if the patch is good and if it will be integrated until then?

        Show
        Miroslav Pankov added a comment - Grant, Our project defense date is 16/Jan/12 (this Monday). Can you give us update on if the patch is good and if it will be integrated until then?
        Hide
        Jeff Eastman added a comment -

        Evidently not the source of the current build problem. Moving to 0.7

        Show
        Jeff Eastman added a comment - Evidently not the source of the current build problem. Moving to 0.7
        Hide
        Grant Ingersoll added a comment -

        Jeff, nothing's been committed here in recent time.

        Show
        Grant Ingersoll added a comment - Jeff, nothing's been committed here in recent time.
        Hide
        Jeff Eastman added a comment -

        This issue seems to be causing a problem with Jenkins so marking it must fix for 0.6

        Show
        Jeff Eastman added a comment - This issue seems to be causing a problem with Jenkins so marking it must fix for 0.6
        Hide
        Miroslav Pankov added a comment -

        Please check the new patch and if it fixes the issue.

        Show
        Miroslav Pankov added a comment - Please check the new patch and if it fixes the issue.
        Hide
        Grant Ingersoll added a comment -

        I was just invoking the main() method via my IDE. I suspect there is just some oddity in the construction of the classes.

        Show
        Grant Ingersoll added a comment - I was just invoking the main() method via my IDE. I suspect there is just some oddity in the construction of the classes.
        Hide
        Miroslav Pankov added a comment -

        I have added protection in the class for null plotType parameter because of which the null pointer exception occurs, but I don't think that with normal run of the program this problem should occur.

        Do you create a separate instance of the DisplayMinHash class somewhere? Is the correct PlotType parameter passed?

        Show
        Miroslav Pankov added a comment - I have added protection in the class for null plotType parameter because of which the null pointer exception occurs, but I don't think that with normal run of the program this problem should occur. Do you create a separate instance of the DisplayMinHash class somewhere? Is the correct PlotType parameter passed?
        Hide
        Grant Ingersoll added a comment -

        FWIW, it still displays after that, probably a thread issue.

        Show
        Grant Ingersoll added a comment - FWIW, it still displays after that, probably a thread issue.
        Hide
        Grant Ingersoll added a comment -

        that's when passing in -p

        Show
        Grant Ingersoll added a comment - that's when passing in -p
        Hide
        Grant Ingersoll added a comment -

        Love the sounds of this. Applied the patch and am getting:

        Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
        at org.apache.mahout.clustering.display.DisplayMinHash.plotClusters(DisplayMinHash.java:140)
        at org.apache.mahout.clustering.display.DisplayMinHash.paint(DisplayMinHash.java:132)
        at sun.awt.RepaintArea.paintComponent(RepaintArea.java:276)
        at sun.awt.RepaintArea.paint(RepaintArea.java:241)
        at apple.awt.ComponentModel.handleEvent(ComponentModel.java:263)
        at apple.awt.CWindow.handleEvent(CWindow.java:545)
        at java.awt.Component.dispatchEventImpl(Component.java:4811)
        at java.awt.Container.dispatchEventImpl(Container.java:2143)
        at java.awt.Window.dispatchEventImpl(Window.java:2478)
        at java.awt.Component.dispatchEvent(Component.java:4565)
        at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:679)
        at java.awt.EventQueue.access$000(EventQueue.java:85)
        at java.awt.EventQueue$1.run(EventQueue.java:638)
        at java.awt.EventQueue$1.run(EventQueue.java:636)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
        at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98)
        at java.awt.EventQueue$2.run(EventQueue.java:652)
        at java.awt.EventQueue$2.run(EventQueue.java:650)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
        at java.awt.EventQueue.dispatchEvent(EventQueue.java:649)
        at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296)
        at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
        at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
        at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)

        Show
        Grant Ingersoll added a comment - Love the sounds of this. Applied the patch and am getting: Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at org.apache.mahout.clustering.display.DisplayMinHash.plotClusters(DisplayMinHash.java:140) at org.apache.mahout.clustering.display.DisplayMinHash.paint(DisplayMinHash.java:132) at sun.awt.RepaintArea.paintComponent(RepaintArea.java:276) at sun.awt.RepaintArea.paint(RepaintArea.java:241) at apple.awt.ComponentModel.handleEvent(ComponentModel.java:263) at apple.awt.CWindow.handleEvent(CWindow.java:545) at java.awt.Component.dispatchEventImpl(Component.java:4811) at java.awt.Container.dispatchEventImpl(Container.java:2143) at java.awt.Window.dispatchEventImpl(Window.java:2478) at java.awt.Component.dispatchEvent(Component.java:4565) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:679) at java.awt.EventQueue.access$000(EventQueue.java:85) at java.awt.EventQueue$1.run(EventQueue.java:638) at java.awt.EventQueue$1.run(EventQueue.java:636) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98) at java.awt.EventQueue$2.run(EventQueue.java:652) at java.awt.EventQueue$2.run(EventQueue.java:650) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at java.awt.EventQueue.dispatchEvent(EventQueue.java:649) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188) at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
        Hide
        Miroslav Pankov added a comment -

        Here is the solution to the problem. The original proposed solution with the lines connecting all of the points in a cluster is included, but it doesn't look very good with the sample data and it is very hard to understand which points belong to which cluster. We have added the following display options to the program (they can be found in the documentation of the class too):
        1. Highlight different cluster's points in a slide show: This presentation is turned by the -p command-line parameter. Then it starts a slide show highlighting the different clusters in a certain update period. The slide show can be paused/continued with the space key. This is the default option. The update period is configurable too and it is read from the second command-line parameter. It is expected to be an integer which represents the seconds for which each of the clusters will stay highlighted.
        2. Display lines between all points in a cluster: This presentation is turned by the -l command-line parameter. Each cluster has lines with different colors because a point can belong to more than one cluster and it is not possible to track it if the colors do not differ. This display doesn't look good with the sample data because it is very big. It can be used to view low number of clusters and points.
        3. Display clusters as symbols: This presentation is turned by the -s command-line parameter. Each cluster has a unique symbol representation which is a character symbol in a specific (randomly chosen) color. Near all of the points the symbols of the clusters in which they belong are drawn. However with the sample data this presentation doesn't look really good too because each point belongs to 4+ clusters. This presentation is good when the points belong to 1 or maximum 2 clusters.

        Show
        Miroslav Pankov added a comment - Here is the solution to the problem. The original proposed solution with the lines connecting all of the points in a cluster is included, but it doesn't look very good with the sample data and it is very hard to understand which points belong to which cluster. We have added the following display options to the program (they can be found in the documentation of the class too): 1. Highlight different cluster's points in a slide show: This presentation is turned by the -p command-line parameter. Then it starts a slide show highlighting the different clusters in a certain update period. The slide show can be paused/continued with the space key. This is the default option. The update period is configurable too and it is read from the second command-line parameter. It is expected to be an integer which represents the seconds for which each of the clusters will stay highlighted. 2. Display lines between all points in a cluster: This presentation is turned by the -l command-line parameter. Each cluster has lines with different colors because a point can belong to more than one cluster and it is not possible to track it if the colors do not differ. This display doesn't look good with the sample data because it is very big. It can be used to view low number of clusters and points. 3. Display clusters as symbols: This presentation is turned by the -s command-line parameter. Each cluster has a unique symbol representation which is a character symbol in a specific (randomly chosen) color. Near all of the points the symbols of the clusters in which they belong are drawn. However with the sample data this presentation doesn't look really good too because each point belongs to 4+ clusters. This presentation is good when the points belong to 1 or maximum 2 clusters.
        Hide
        Grant Ingersoll added a comment -

        Hi Miroslav,

        By all means! We'd love to have the contribution. Just put up a patch on the issue when you are ready. See https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute

        Show
        Grant Ingersoll added a comment - Hi Miroslav, By all means! We'd love to have the contribution. Just put up a patch on the issue when you are ready. See https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
        Hide
        Miroslav Pankov added a comment -

        Hello, we are a group of students learning artificial intelligence and we have a project which we should do at the end of the course. We have chosen to contribute to the Mahout project in order to complete the course.

        Can we take this bug as our project? If not, can you suggest a suitable bug for us? Thanks in advance.

        P.S. We don't have any distributed environment/resources for it, so its preferable the single PC test of the task to be enough for its completion.

        Best regards,
        Miroslav Pankov

        Show
        Miroslav Pankov added a comment - Hello, we are a group of students learning artificial intelligence and we have a project which we should do at the end of the course. We have chosen to contribute to the Mahout project in order to complete the course. Can we take this bug as our project? If not, can you suggest a suitable bug for us? Thanks in advance. P.S. We don't have any distributed environment/resources for it, so its preferable the single PC test of the task to be enough for its completion. Best regards, Miroslav Pankov
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1138 (See https://builds.apache.org/job/Mahout-Quality/1138/)
        MAHOUT-863: test for murmur hash

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1196707
        Files :

        • /mahout/trunk/math/src/test/java/org/apache/mahout/math/MurmurHash3Test.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1138 (See https://builds.apache.org/job/Mahout-Quality/1138/ ) MAHOUT-863 : test for murmur hash gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1196707 Files : /mahout/trunk/math/src/test/java/org/apache/mahout/math/MurmurHash3Test.java
        Hide
        Grant Ingersoll added a comment -

        Here's a start. It doesn't display the items yet, namely, because we don't have the typical centroid whereby one can draw an ellipse of a certain size. My original thought was to draw lines connecting items, but I'm not making much headway on that just yet. Perhaps someone w/ a bit more Java graphics background can come up with something more useful.

        At any rate, the framework for running this is all there, just needs a push over the edge to get the actual cluster display right.

        Show
        Grant Ingersoll added a comment - Here's a start. It doesn't display the items yet, namely, because we don't have the typical centroid whereby one can draw an ellipse of a certain size. My original thought was to draw lines connecting items, but I'm not making much headway on that just yet. Perhaps someone w/ a bit more Java graphics background can come up with something more useful. At any rate, the framework for running this is all there, just needs a push over the edge to get the actual cluster display right.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development