Mahout
  1. Mahout
  2. MAHOUT-812

Allow ConfusionMatrix to be Writable (via MatrixWritable)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      ConfusionMatrix does not support Writable. This patch adds that feature. ConfusionMatrix is a subclass of MatrixWritable.

      Since ConfusionMatrix is somewhat less than useful without the row/column labels, and MatrixWritable does not support writing bindings (it only saves numbers), this patch fixes both.

      Includes unit test for ConfusionMatrix (previously missing) which includes exercise of MatrixWritable support for numbers and labels. (There is no independent unit test for MatrixWritable.)

      1. MAHOUT-812.patch
        13 kB
        Lance Norskog
      2. MAHOUT-812.patch
        16 kB
        Lance Norskog
      3. MAHOUT-812.patch
        14 kB
        Sean Owen

        Issue Links

          Activity

          Hide
          Sean Owen added a comment -

          In general, Matrix implementations are not Writable. MatrixWritable, however, is a container for a Matrix which is Writable. The idea is that MatrixWritable handle the serialization. This has a few nice properties, primarily that Matrix/Vector are not quite so coupled to Hadoop. Can MatrixWritable therefore be enhanced to write labels? That would be of general use beyond ConfusionMatrix.

          Show
          Sean Owen added a comment - In general, Matrix implementations are not Writable. MatrixWritable, however, is a container for a Matrix which is Writable. The idea is that MatrixWritable handle the serialization. This has a few nice properties, primarily that Matrix/Vector are not quite so coupled to Hadoop. Can MatrixWritable therefore be enhanced to write labels? That would be of general use beyond ConfusionMatrix.
          Hide
          Ted Dunning added a comment -

          Sounds right to handle the labels.

          Show
          Ted Dunning added a comment - Sounds right to handle the labels.
          Hide
          Lance Norskog added a comment -

          Yeah, it did not seem right to me either. What about: ConfusionMatrix has a delegate Matrix (Dense) and is disjoint from Writable-ness. You pull the delegate and Writable that.

          Would an R-style dataframe work better than these Matrix & Vector formats? One common format for Vectors, Matrices, and possibly tensors?

          Show
          Lance Norskog added a comment - Yeah, it did not seem right to me either. What about: ConfusionMatrix has a delegate Matrix (Dense) and is disjoint from Writable-ness. You pull the delegate and Writable that. Would an R-style dataframe work better than these Matrix & Vector formats? One common format for Vectors, Matrices, and possibly tensors?
          Hide
          Sean Owen added a comment -

          Yeah something like that – though look at how MatrixWritable handles stuff now. You may not have to go to the trouble of making a delegate. It ought to interrogate the object as normal through its methods.

          Show
          Sean Owen added a comment - Yeah something like that – though look at how MatrixWritable handles stuff now. You may not have to go to the trouble of making a delegate. It ought to interrogate the object as normal through its methods.
          Hide
          Lance Norskog added a comment -

          Rewritten as per comments. ConfusionMatrix is not Writable. MatrixWritable now handles labels.

          Added unit test for Names with NamedVectors.

          Show
          Lance Norskog added a comment - Rewritten as per comments. ConfusionMatrix is not Writable. MatrixWritable now handles labels. Added unit test for Names with NamedVectors.
          Hide
          Sean Owen added a comment -

          OK, this is looking reasonable.

          Some of the formatting needs to be cleaned up – some lines are not indented or indented with tabs, etc. What's up with calling hashCode() on an array and not using the result?

          MatrixWritable looks good, along with the rest of the test code.

          In ConfusionMatrix: why add 0.0001 to all elements? Is it so that the doubles are converted back to ints properly on the other side? I'd do a round then.

          Attached is my take on the patch.

          Show
          Sean Owen added a comment - OK, this is looking reasonable. Some of the formatting needs to be cleaned up – some lines are not indented or indented with tabs, etc. What's up with calling hashCode() on an array and not using the result? MatrixWritable looks good, along with the rest of the test code. In ConfusionMatrix: why add 0.0001 to all elements? Is it so that the doubles are converted back to ints properly on the other side? I'd do a round then. Attached is my take on the patch.
          Hide
          Hudson added a comment -

          Integrated in Mahout-Quality #1079 (See https://builds.apache.org/job/Mahout-Quality/1079/)
          MAHOUT-812 help make confusion matrix writable

          srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178324
          Files :

          • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/ConfusionMatrix.java
          • /mahout/trunk/core/src/main/java/org/apache/mahout/math/MatrixWritable.java
          • /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/ConfusionMatrixTest.java
          • /mahout/trunk/core/src/test/java/org/apache/mahout/math/MatrixWritableTest.java
          • /mahout/trunk/core/src/test/java/org/apache/mahout/math/VectorWritableTest.java
          Show
          Hudson added a comment - Integrated in Mahout-Quality #1079 (See https://builds.apache.org/job/Mahout-Quality/1079/ ) MAHOUT-812 help make confusion matrix writable srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178324 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/ConfusionMatrix.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/MatrixWritable.java /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/ConfusionMatrixTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/MatrixWritableTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/VectorWritableTest.java

            People

            • Assignee:
              Sean Owen
              Reporter:
              Lance Norskog
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development