Mahout
  1. Mahout
  2. MAHOUT-756

VectorList (Matrix implementation) does not maintain cardinality getters correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: Math
    • Labels:
      None

      Description

      VectorList (implements Matrix) is dynamically expandable, row-wise. There are three different ways to query the size of a Matrix, and VectorList does not correctly supply these values.

      1. VectorList.patch
        9 kB
        Lance Norskog

        Activity

        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #936 (See https://builds.apache.org/job/Mahout-Quality/936/)
        MAHOUT-756 kill implementation due to several small issues

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1145795
        Files :

        • /mahout/trunk/math/src/test/java/org/apache/mahout/math/VectorListTest.java
        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/VectorList.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #936 (See https://builds.apache.org/job/Mahout-Quality/936/ ) MAHOUT-756 kill implementation due to several small issues srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1145795 Files : /mahout/trunk/math/src/test/java/org/apache/mahout/math/VectorListTest.java /mahout/trunk/math/src/main/java/org/apache/mahout/math/VectorList.java
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Sean Owen [ srowen ]
        Fix Version/s 0.6 [ 12316364 ]
        Resolution Fixed [ 1 ]
        Hide
        Sean Owen added a comment -

        ... in the sense that I removed it.

        Show
        Sean Owen added a comment - ... in the sense that I removed it.
        Hide
        Sean Owen added a comment -

        UnsupportedOperationException can't be right – this operation is supportable.

        @Ted: yes it doesn't seem right to use DenseVector; that's what it does now though. The way out is probably to add a like(int) method to Vector.

        I agree that there are several potential additional issues with this class (like exceptions on access).

        Here's the kicker: nothing uses this class. Given this thread alone I'd suggest it's better to remove it and start again later if desired.

        Show
        Sean Owen added a comment - UnsupportedOperationException can't be right – this operation is supportable. @Ted: yes it doesn't seem right to use DenseVector; that's what it does now though. The way out is probably to add a like(int) method to Vector. I agree that there are several potential additional issues with this class (like exceptions on access). Here's the kicker: nothing uses this class. Given this thread alone I'd suggest it's better to remove it and start again later if desired.
        Hide
        Lance Norskog added a comment -

        When it is this confusing, UnsupportedOperationException is permissible. There is no obvious right thing to do. What if there are different Vector classes and some empty slots?

        Using DenseVector/Matrix is the default policy in other places, so let's go with that. If your program runs out of memory, you'll eventually find all of the DenseVectors and track down the problem. In the meantime, it will be slow but will work.

        Show
        Lance Norskog added a comment - When it is this confusing, UnsupportedOperationException is permissible. There is no obvious right thing to do. What if there are different Vector classes and some empty slots? Using DenseVector/Matrix is the default policy in other places, so let's go with that. If your program runs out of memory, you'll eventually find all of the DenseVectors and track down the problem. In the meantime, it will be slow but will work.
        Hide
        Sean Owen added a comment -

        OK, I see: like(int, int) used a DenseVector all the time since it wanted to specify the cardinality concretely. I think it's OK to leave that as-is, as it's not obviously a problem, rather than change its behavior. It's not necessary to make the like() method always use DenseVector though? There it's fine to call Vector.like().

        The two methods were not consistent in this regard – you flipped them both to do what the other did. Consistency might be good here... I don't know. Always use DenseVector, or use like() in the one case where you can?

        Show
        Sean Owen added a comment - OK, I see: like(int, int) used a DenseVector all the time since it wanted to specify the cardinality concretely. I think it's OK to leave that as-is, as it's not obviously a problem, rather than change its behavior. It's not necessary to make the like() method always use DenseVector though? There it's fine to call Vector.like(). The two methods were not consistent in this regard – you flipped them both to do what the other did. Consistency might be good here... I don't know. Always use DenseVector, or use like() in the one case where you can?
        Hide
        Sean Owen added a comment -

        I like most of this cleanup. I think we could leave the getQuick()/setQuick() methods as-is – no need for additional arg checking. Their intent is to be used when the caller knows it's OK.

        I don't know that it's a problem that this allows different rows to be of different types. I don't think any caller uses it that way anyway.

        It is interesting what "like" means for the method that specifies a new number of rows – what are the types of the new rows? Right now it forces them to DenseVector which doesn't feel right if the other like() method does not do the same. On the other hand, you're requesting a different number of rows, which means the type of new rows aren't defined anyway, so how much can the caller care?

        I think the least surprising thing for the caller, who will normally have an object with all one type of vector, is to have a result of the same type of vector. So, in that sense I think your change is good.

        But there isn't a need for a new likeVector() method; call Vector.like() as the other method does.

        Let me see what happens under these changes.

        Show
        Sean Owen added a comment - I like most of this cleanup. I think we could leave the getQuick()/setQuick() methods as-is – no need for additional arg checking. Their intent is to be used when the caller knows it's OK. I don't know that it's a problem that this allows different rows to be of different types. I don't think any caller uses it that way anyway. It is interesting what "like" means for the method that specifies a new number of rows – what are the types of the new rows? Right now it forces them to DenseVector which doesn't feel right if the other like() method does not do the same. On the other hand, you're requesting a different number of rows, which means the type of new rows aren't defined anyway, so how much can the caller care? I think the least surprising thing for the caller, who will normally have an object with all one type of vector, is to have a result of the same type of vector. So, in that sense I think your change is good. But there isn't a need for a new likeVector() method; call Vector.like() as the other method does. Let me see what happens under these changes.
        Lance Norskog made changes -
        Summary VectorList (Matrix implementation) does maintain cardinality getters correctly VectorList (Matrix implementation) does not maintain cardinality getters correctly
        Hide
        Lance Norskog added a comment -

        Perhaps like() and like(row, columns) should just make an empty VectorList? After all, like() is not clone(). (Which VectorList does not implement and should.)

        Show
        Lance Norskog added a comment - Perhaps like() and like(row, columns) should just make an empty VectorList? After all, like() is not clone(). (Which VectorList does not implement and should.)
        Lance Norskog made changes -
        Field Original Value New Value
        Attachment VectorList.patch [ 12486012 ]
        Hide
        Lance Norskog added a comment -

        VectorList.java:

        • Fixed maintenance of cardinality values. Removed local copy of # of columns.
        • Pre-allocate rows when given # of rows in constructor.
        • Work around "like" problems: the class allows different vectors to be different classes, and so making a 'like' Matrix with more rows is impossible.
        • When accessing an empty row, change NullPointerException to IllegalStateException.

        VectorListTest.java:
        Added a bunch of tests for dynamic row additions.

        Show
        Lance Norskog added a comment - VectorList.java: Fixed maintenance of cardinality values. Removed local copy of # of columns. Pre-allocate rows when given # of rows in constructor. Work around "like" problems: the class allows different vectors to be different classes, and so making a 'like' Matrix with more rows is impossible. When accessing an empty row, change NullPointerException to IllegalStateException. VectorListTest.java: Added a bunch of tests for dynamic row additions.
        Lance Norskog created issue -

          People

          • Assignee:
            Sean Owen
            Reporter:
            Lance Norskog
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development