Mahout
  1. Mahout
  2. MAHOUT-1343

JSON output format for clusterdumper

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: 0.8
    • Fix Version/s: 0.9
    • Component/s: Clustering, Integration
    • Labels:

      Description

      This patch adds JSON output format to the clusterdump utility. Each cluster is represented as a JSON-encoded line. The command is something like:

      >> mahout clusterdump -d dictionary -dt text -i clusters/clusters-2-final -p clusters/clusteredPoints -n 10 -o clusterdump.json -of JSON

      1. clusterdump-example.json
        2 kB
        Telvis Calhoun
      2. MAHOUT-1343.patch
        10 kB
        Telvis Calhoun
      3. MAHOUT-1343.patch
        18 kB
        Telvis Calhoun

        Activity

        Hide
        Telvis Calhoun added a comment -

        Here is a pretty-printed example of a JSON-formatted output from clusterdump -of JSON.

        Show
        Telvis Calhoun added a comment - Here is a pretty-printed example of a JSON-formatted output from clusterdump -of JSON.
        Hide
        Telvis Calhoun added a comment - - edited

        Patch containing modifications to ClusterDumper, JSONClusterWriter class and unittest.

        Show
        Telvis Calhoun added a comment - - edited Patch containing modifications to ClusterDumper, JSONClusterWriter class and unittest.
        Hide
        Stevo Slavic added a comment -

        Since proposed JsonClusterWriter doesn't make use of subString (just like existing CSVClusterWriter), consider omitting it from class, constructor, and constructor call.
        There is some trailing whitespace introduced in new JSON case in ClusterDumper, and a tiny indentation error in unit test.
        Apart from these tiny bits, patch looks OK to me.

        Show
        Stevo Slavic added a comment - Since proposed JsonClusterWriter doesn't make use of subString (just like existing CSVClusterWriter ), consider omitting it from class, constructor, and constructor call. There is some trailing whitespace introduced in new JSON case in ClusterDumper , and a tiny indentation error in unit test. Apart from these tiny bits, patch looks OK to me.
        Hide
        Telvis Calhoun added a comment -

        Understood. I'll make those changes and resubmit. Thank you for the feedback Stevo.

        Show
        Telvis Calhoun added a comment - Understood. I'll make those changes and resubmit. Thank you for the feedback Stevo.
        Hide
        Telvis Calhoun added a comment -

        Patch containing modifications to ClusterDumper, JSONClusterWriter class and unittest. Also contains changes based on peer review feedback.

        Show
        Telvis Calhoun added a comment - Patch containing modifications to ClusterDumper, JSONClusterWriter class and unittest. Also contains changes based on peer review feedback.
        Hide
        Suneel Marthi added a comment - - edited

        [Stevo Slavic Could you review this patch and commit the same? Thanks.

        Show
        Suneel Marthi added a comment - - edited [ Stevo Slavic Could you review this patch and commit the same? Thanks.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2310 (See https://builds.apache.org/job/Mahout-Quality/2310/)
        MAHOUT-1343: JSON output format support in cluster dumper (sslavic: rev 1538406)

        • /mahout/trunk/CHANGELOG
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2310 (See https://builds.apache.org/job/Mahout-Quality/2310/ ) MAHOUT-1343 : JSON output format support in cluster dumper (sslavic: rev 1538406) /mahout/trunk/CHANGELOG /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        Hide
        Stevo Slavic added a comment -

        Patch has been integrated. Thanks Telvis Calhoun for providing it!

        Resolving issue as implemented.

        Show
        Stevo Slavic added a comment - Patch has been integrated. Thanks Telvis Calhoun for providing it! Resolving issue as implemented.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2311 (See https://builds.apache.org/job/Mahout-Quality/2311/)
        Mahout-1343: more minor cleanups (smarthi: rev 1538424)

        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
          Mahout-1343: monor code cleanups (smarthi: rev 1538421)
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2311 (See https://builds.apache.org/job/Mahout-Quality/2311/ ) Mahout-1343: more minor cleanups (smarthi: rev 1538424) /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java Mahout-1343: monor code cleanups (smarthi: rev 1538421) /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2324 (See https://builds.apache.org/job/Mahout-Quality/2324/)
        MAHOUT-1343: Replaced deprecated Lucene 3.x API with equivalent Lucene 4.x API. (smarthi: rev 1542647)

        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2324 (See https://builds.apache.org/job/Mahout-Quality/2324/ ) MAHOUT-1343 : Replaced deprecated Lucene 3.x API with equivalent Lucene 4.x API. (smarthi: rev 1542647) /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Mahout-Quality #2337 (See https://builds.apache.org/job/Mahout-Quality/2337/)
        MAHOUT-1343: More Lucene 3.x calls that need to be replaced by equivalent Lucene 4.x API (smarthi: rev 1546288)

        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfo.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfoTest.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/LuceneIterableTest.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Mahout-Quality #2337 (See https://builds.apache.org/job/Mahout-Quality/2337/ ) MAHOUT-1343 : More Lucene 3.x calls that need to be replaced by equivalent Lucene 4.x API (smarthi: rev 1546288) /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfo.java /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfoTest.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/LuceneIterableTest.java

          People

          • Assignee:
            Suneel Marthi
            Reporter:
            Telvis Calhoun
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development