Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Labels:
      None
    • Environment:

      N/A

      Description

      I needed a way to deal with large sparse matrices using commons-math RealMatrix, so I implemented it. The SparseRealMatrixImpl is a subclass of RealMatrixImpl, and the backing data structure is a Map<Point,Double>, where Point is a struct like inner-class which exposes two int parameters row and column. I had to make some changes to the existing components to keep the code for SparseRealMatrixImpl clean. Here are the details.

      1) RealMatrix.java:

      • added a new method setEntry(int, int, double) to set data into a matrix
        2) RealMatrixImpl.java:
      • changed all internal calls to data[i][j] to getEntry(i,j).
      • for some methods such as add(), subtract(), premultiply(), etc, there
        was code that checked for ClassCastException and had two versions,
        one for a generic RealMatrix and one for a RealMatrixImpl. This has
        been changed to have only one that operates on a RealMatrix. The
        result is something like auto-type casting. So if:
        RealMatrixImpl.add(RealMatrix) returns a RealMatrixImpl
        SparseRealMatrixImpl.add(RealMatrix) returns a SparseRealMatrixImpl
        3) SparseRealMatrixImpl added as a subclass of RealMatrixImpl.
        4) LUDecompositionImpl changed to use a clone of the passed in RealMatrix
        instead of its data[][] block, and now it uses clone.getEntry(row,col)
        calls instead of data[row][col] calls.
        5) LUDecompositionImpl returned RealMatrixImpl for getL(), getU(), getP()
        and solve(). It now returns the same RealMatrix impl that is passed
        in through its constructor for these methods.
        6) New test for SparseRealMatrixImpl, mimics the tests in RealMatrixImplTest,
        7) New static method to create SparseRealMatrixImpl out of a double[][] in
        MatrixUtils.createSparseRealMatrix().
        but using SparseRealMatrixImpl.
        8) Verified that all JUnit tests pass.
      1. math-230.diff
        49 kB
        Ismael Juma
      2. patch-2.txt
        2 kB
        Sujit Pal
      3. RealMatrixImplPerformanceTest.java
        2 kB
        Sujit Pal
      4. SparseRealMatrix.java
        11 kB
        Sujit Pal
      5. SparseRealMatrixTest.java
        25 kB
        Sujit Pal

        Issue Links

          Activity

          Hide
          sujitpal Sujit Pal added a comment -

          Patch file containing diffs for java classes that are already in the BRANCH_2_0 code. List follows:

          src/java/org/apache/commons/math/linear/LUDecompositionImpl.java
          src/java/org/apache/commons/math/linear/RealMatrix.java
          src/java/org/apache/commons/math/linear/MatrixUtils.java
          src/java/org/apache/commons/math/linear/RealMatrixImpl.java

          Show
          sujitpal Sujit Pal added a comment - Patch file containing diffs for java classes that are already in the BRANCH_2_0 code. List follows: src/java/org/apache/commons/math/linear/LUDecompositionImpl.java src/java/org/apache/commons/math/linear/RealMatrix.java src/java/org/apache/commons/math/linear/MatrixUtils.java src/java/org/apache/commons/math/linear/RealMatrixImpl.java
          Hide
          sujitpal Sujit Pal added a comment -

          SparseRealMatrixImpl.java

          Show
          sujitpal Sujit Pal added a comment - SparseRealMatrixImpl.java
          Hide
          sujitpal Sujit Pal added a comment -

          SparseRealMatrixImplTest.java

          Show
          sujitpal Sujit Pal added a comment - SparseRealMatrixImplTest.java
          Hide
          sujitpal Sujit Pal added a comment -

          Based on conversation with previous committers for the RealMatrixImpl code, I discovered that I had accidentally removed some performance enhancement code. I have put this back in and am resubmitting a fresh patch.

          In addition to the changes described in the initial comment, the change here are:
          1) Put back in the RealMatrixImpl.add(RealMatrixImpl), RealMatrixImpl.subtract(RealMatrixImpl) and RealMatrixImpl.multiply(RealMatrixImpl) methods. These are now private methods since they are meant to be used by the class itself for performance. Having them be public meant that subclasses would use these instead its own overriden API methods, because the subclasses of RealMatrixImpl is also a RealMatrixImpl, and thus a better match than a RealMatrix.
          2) Add back the code which delegates to this method from RealMatrixImpl.add(RealMatrix), RealMatrixImpl.subtract(RealMatrix) and RealMatrixImpl.multiply(RealMatrix). Instead of the try {} catch (ClassCastException) {}, it is now using a if (RealMatrixImpl.class.equals(m.getClass()) {} else {} block to get around the fact that subclasses are also RealMatrixImpl and hence are chosen in the try block.

          Show
          sujitpal Sujit Pal added a comment - Based on conversation with previous committers for the RealMatrixImpl code, I discovered that I had accidentally removed some performance enhancement code. I have put this back in and am resubmitting a fresh patch. In addition to the changes described in the initial comment, the change here are: 1) Put back in the RealMatrixImpl.add(RealMatrixImpl), RealMatrixImpl.subtract(RealMatrixImpl) and RealMatrixImpl.multiply(RealMatrixImpl) methods. These are now private methods since they are meant to be used by the class itself for performance. Having them be public meant that subclasses would use these instead its own overriden API methods, because the subclasses of RealMatrixImpl is also a RealMatrixImpl, and thus a better match than a RealMatrix. 2) Add back the code which delegates to this method from RealMatrixImpl.add(RealMatrix), RealMatrixImpl.subtract(RealMatrix) and RealMatrixImpl.multiply(RealMatrix). Instead of the try {} catch (ClassCastException) {}, it is now using a if (RealMatrixImpl.class.equals(m.getClass()) {} else {} block to get around the fact that subclasses are also RealMatrixImpl and hence are chosen in the try block.
          Hide
          sujitpal Sujit Pal added a comment -

          svn diff containing diffs against trunk for the following files:

          src/java/org/apache/commons/math/linear/LUDecompositionImpl.java
          src/java/org/apache/commons/math/linear/RealMatrix.java
          src/java/org/apache/commons/math/linear/MatrixUtils.java
          src/java/org/apache/commons/math/linear/RealMatrixImpl.java

          Show
          sujitpal Sujit Pal added a comment - svn diff containing diffs against trunk for the following files: src/java/org/apache/commons/math/linear/LUDecompositionImpl.java src/java/org/apache/commons/math/linear/RealMatrix.java src/java/org/apache/commons/math/linear/MatrixUtils.java src/java/org/apache/commons/math/linear/RealMatrixImpl.java
          Hide
          sujitpal Sujit Pal added a comment -

          SparseRealMatrixImpl.java – subclasses RealMatrixImpl.java with a backing data store of Map<Point,Double>.

          Show
          sujitpal Sujit Pal added a comment - SparseRealMatrixImpl.java – subclasses RealMatrixImpl.java with a backing data store of Map<Point,Double>.
          Hide
          sujitpal Sujit Pal added a comment -

          Test case for SparseRealMatrixImpl.java – mimics tests in RealMatrixImplTest, against a SparseRealMatrixImpl.

          Show
          sujitpal Sujit Pal added a comment - Test case for SparseRealMatrixImpl.java – mimics tests in RealMatrixImplTest, against a SparseRealMatrixImpl.
          Hide
          sujitpal Sujit Pal added a comment -
              • Not for inclusion in svn, only meant to be used if necessary for verification of times below ***

          Quick and dirty test (to measure wallclock times for multiplying RealMatrixImpls using the original method (currently in svn), the getEntry() method (from the original patch submission), and the rolled back method (using data[][] as fallback). My test times on a 64-bit AMD dual-core laptop with 2GB RAM is shown below:

          size data[][] getEntry() restored data[][]
          [3x2]*[2x4] 0ms 0ms 0ms
          [6x4]*[4x8]) 0ms 0ms 0ms
          [12x8]*[8x16]) 1ms 0ms 0ms
          [24x16]*[16x32] 2ms 2ms 2ms
          [48x32]*[32x64]) 14ms 16ms 23ms
          [96x64]*[64x128] 2ms 3ms 3ms
          [192x128]*[128x256] 24ms 50ms 28ms
          [384x256]*[256x512] 770ms 813ms 768ms

          Show
          sujitpal Sujit Pal added a comment - Not for inclusion in svn, only meant to be used if necessary for verification of times below *** Quick and dirty test (to measure wallclock times for multiplying RealMatrixImpls using the original method (currently in svn), the getEntry() method (from the original patch submission), and the rolled back method (using data[][] as fallback). My test times on a 64-bit AMD dual-core laptop with 2GB RAM is shown below: size data[][] getEntry() restored data[][] [3x2] * [2x4] 0ms 0ms 0ms [6x4] * [4x8] ) 0ms 0ms 0ms [12x8] * [8x16] ) 1ms 0ms 0ms [24x16] * [16x32] 2ms 2ms 2ms [48x32] * [32x64] ) 14ms 16ms 23ms [96x64] * [64x128] 2ms 3ms 3ms [192x128] * [128x256] 24ms 50ms 28ms [384x256] * [256x512] 770ms 813ms 768ms
          Hide
          ijuma Ismael Juma added a comment -

          One issue with the proposed implementation is that it causes excessive objective allocation. Each non-zero entry involves a HashMap.Entry, a Point and a Double. This is quite inefficient in terms of memory usage.

          One alternative would be to use a specialised HashMap with open addressing that would only take double primitives as values (similar to TObjectDoubleHashMap from Trove). Using this approach, the only object would be Point. It's possible to avoid that too, but that's more complicated (look at Colt's SparseDoubleMatrix2D for an example).

          Show
          ijuma Ismael Juma added a comment - One issue with the proposed implementation is that it causes excessive objective allocation. Each non-zero entry involves a HashMap.Entry, a Point and a Double. This is quite inefficient in terms of memory usage. One alternative would be to use a specialised HashMap with open addressing that would only take double primitives as values (similar to TObjectDoubleHashMap from Trove). Using this approach, the only object would be Point. It's possible to avoid that too, but that's more complicated (look at Colt's SparseDoubleMatrix2D for an example).
          Hide
          sujitpal Sujit Pal added a comment -

          Thanks for the pointer to TObjectDoubleHashMap. So basically what you
          are suggesting is for the incoming (x,y) object, whatever that is, to be
          hashed to a single int object which is modded against the internal
          one-dimensional array length. Finally the hash/mod result needs to be
          checked against the internal double array to see if it is already
          occupied, and if so, another location should be found.

          Here are the relevant links:
          http://trove4j.cvs.sourceforge.net/viewvc/trove4j/trove/src/gnu/trove/TObjectDoubleHashMap.java?hideattic=0&revision=1.16&view=markup
          http://trove4j.cvs.sourceforge.net/viewvc/trove4j/trove/src/gnu/trove/TObjectHash.java?hideattic=0&revision=1.26&view=markup

          That /would/ definitely decrease the size of the matrix, but would make
          the get/set code quite complex, unless one could use (something like)
          trove4j as a dependency.

          Rather than add this impl, then, it may be better to open the door for
          extension (via the suggestion in MATH-231) and let the client (who will
          probably find it easier to add different jars as dependencies) do the
          extension as he desires?

          Show
          sujitpal Sujit Pal added a comment - Thanks for the pointer to TObjectDoubleHashMap. So basically what you are suggesting is for the incoming (x,y) object, whatever that is, to be hashed to a single int object which is modded against the internal one-dimensional array length. Finally the hash/mod result needs to be checked against the internal double array to see if it is already occupied, and if so, another location should be found. Here are the relevant links: http://trove4j.cvs.sourceforge.net/viewvc/trove4j/trove/src/gnu/trove/TObjectDoubleHashMap.java?hideattic=0&revision=1.16&view=markup http://trove4j.cvs.sourceforge.net/viewvc/trove4j/trove/src/gnu/trove/TObjectHash.java?hideattic=0&revision=1.26&view=markup That /would/ definitely decrease the size of the matrix, but would make the get/set code quite complex, unless one could use (something like) trove4j as a dependency. Rather than add this impl, then, it may be better to open the door for extension (via the suggestion in MATH-231 ) and let the client (who will probably find it easier to add different jars as dependencies) do the extension as he desires?
          Hide
          ijuma Ismael Juma added a comment -

          Or we could just add the appropriate specialised Map to commons-math. Are there objections to that? I could provide one, it's relatively easy to do. I think the best approach might even be one not mentioned above. Use a Map.Entry object with two ints and a double. This has a few advantages:

          (1) We use chaining instead of open addressing, so we can use a higher load factor with good performance (e.g. 0.75 instead of 0.50). This should mitigate the additional memory used by the Map.Entry object.

          (2) We use a single array of Map.Entry objects instead of two arrays (one for keys and one for values). When testing the performance for Scala HashMap implementations (which converts to bytecode very similar to the one generated by Java), it seems that using two arrays performs quite a bit worse (although the test was using Strings instead of primitives for the value)[1].

          (3) Gets don't require any boxing if we provide a get(int, int) method (this is the same as using open addressing with primitives only though).

          A rough sample of the implementation outline:

          class IntPairToDoubleHashMap {
          static class Entry

          { final int firstKey; final int secondKey; double value; }

          int size;
          Entry[] entries;

          double get(int firstKey, int secondKey)
          double put(int firstKey, int secondKey, double value)
          }

          This would be an internal map for usage by SparseRealMatrixImpl. I think HotSpot would be able to inline calls to it, but we could also just have the relevant code in SparseMatrixImpl. I think it's important to provide an implementation that has the right performance characteristics by default.

          What do you think?

          [1] http://www.nabble.com/-scala--Open-versus-Chained-HashMap-td19254845.html

          Show
          ijuma Ismael Juma added a comment - Or we could just add the appropriate specialised Map to commons-math. Are there objections to that? I could provide one, it's relatively easy to do. I think the best approach might even be one not mentioned above. Use a Map.Entry object with two ints and a double. This has a few advantages: (1) We use chaining instead of open addressing, so we can use a higher load factor with good performance (e.g. 0.75 instead of 0.50). This should mitigate the additional memory used by the Map.Entry object. (2) We use a single array of Map.Entry objects instead of two arrays (one for keys and one for values). When testing the performance for Scala HashMap implementations (which converts to bytecode very similar to the one generated by Java), it seems that using two arrays performs quite a bit worse (although the test was using Strings instead of primitives for the value) [1] . (3) Gets don't require any boxing if we provide a get(int, int) method (this is the same as using open addressing with primitives only though). A rough sample of the implementation outline: class IntPairToDoubleHashMap { static class Entry { final int firstKey; final int secondKey; double value; } int size; Entry[] entries; double get(int firstKey, int secondKey) double put(int firstKey, int secondKey, double value) } This would be an internal map for usage by SparseRealMatrixImpl. I think HotSpot would be able to inline calls to it, but we could also just have the relevant code in SparseMatrixImpl. I think it's important to provide an implementation that has the right performance characteristics by default. What do you think? [1] http://www.nabble.com/-scala--Open-versus-Chained-HashMap-td19254845.html
          Hide
          sujitpal Sujit Pal added a comment -

          >> Or we could just add the appropriate specialised Map to commons-math. Are there objections to that?
          I cannot answer this one, since I am not a developer on the project - my observation is that there are no dependencies for this project save the JDK, which leads me to believe that there may have been a concious decision to keep it that way, but I could be wrong. As for adding it into commons-math, my personal opinion is that it belongs in a data structure package such as commons-collections or such, but thats just my view.

          >> I think it's important to provide an implementation that has the right performance characteristics
          >> by default.
          I completely agree. And the open/chained hashmap would definitely be more appropriate in a memory constrained environment where a SparseMatrix would be considered, than my Map<Point,Double>.

          And if it is implemented as a private inner class, then it is visible only from within the implementation, so I think there should be no issues. And it would be completely justifiable, since we are basically creating a fast data structure to serve the impl.

          However, it is already assigned to be reviewed, so I am guessing that when Luc gets to this bug, he will post his review. FWIW, I think that if we do end up including the SparseMatrixImpl, then definitely your approach with the open/chained hashmap would be preferable to the one thats in there. If you have th bandwidth to build it, then I think it may be quite helpful.

          Show
          sujitpal Sujit Pal added a comment - >> Or we could just add the appropriate specialised Map to commons-math. Are there objections to that? I cannot answer this one, since I am not a developer on the project - my observation is that there are no dependencies for this project save the JDK, which leads me to believe that there may have been a concious decision to keep it that way, but I could be wrong. As for adding it into commons-math, my personal opinion is that it belongs in a data structure package such as commons-collections or such, but thats just my view. >> I think it's important to provide an implementation that has the right performance characteristics >> by default. I completely agree. And the open/chained hashmap would definitely be more appropriate in a memory constrained environment where a SparseMatrix would be considered, than my Map<Point,Double>. And if it is implemented as a private inner class, then it is visible only from within the implementation, so I think there should be no issues. And it would be completely justifiable, since we are basically creating a fast data structure to serve the impl. However, it is already assigned to be reviewed, so I am guessing that when Luc gets to this bug, he will post his review. FWIW, I think that if we do end up including the SparseMatrixImpl, then definitely your approach with the open/chained hashmap would be preferable to the one thats in there. If you have th bandwidth to build it, then I think it may be quite helpful.
          Hide
          ijuma Ismael Juma added a comment -

          Yes, when I said "add the appropriate specialised Map to commons-math", I meant it as an internal implementation detail of SparseRealMatrixImpl. I will try to post something here during the weekend and we'll let Luc decide.

          Show
          ijuma Ismael Juma added a comment - Yes, when I said "add the appropriate specialised Map to commons-math", I meant it as an internal implementation detail of SparseRealMatrixImpl. I will try to post something here during the weekend and we'll let Luc decide.
          Hide
          ijuma Ismael Juma added a comment -

          I looked into this more carefully and I have a few comments (note that I don't speak for commons-math, just an interested onlooker who might use it in the near future):

          • There have been a few commits after the patch was generated that make it not apply anymore.
          • I am not sure if extending RealMatrixImpl is the right approach. Stuff like the following is a red flag.

          public double[][] getData()

          { throw new UnsupportedOperationException( "Not supported because sparse matrices are usually chosen when " + "the matrix to be modeled is too large to fit into memory, trying " + "to build the data array from this may result in an OutOfMemoryException."); }

          Also, it seems to override most (if not all) of the methods. Is it perhaps better to introduce an AbstractRealMatrix containing the shared methods and have both RealMatrixImpl and SparseRealMatrix extend it? I notice that this won't fix the getData() issue though, would it make sense to move that method into the dense matrix? (only possible if 2.0 can break compatibility).

          • The following code in SparseRealMatrixImpl.setEntry does not need the containsKey call, remove can be called directly (avoiding an unnecessary look-up):
            // if an entry exists, remove it
            if (entries.containsKey(point)) { entries.remove(point); }

          There is another case of containsKey that is unnecessary because it's followed by a get after it, but once the specialised map exists the code will be even simpler so we can leave that for later.

          Show
          ijuma Ismael Juma added a comment - I looked into this more carefully and I have a few comments (note that I don't speak for commons-math, just an interested onlooker who might use it in the near future): There have been a few commits after the patch was generated that make it not apply anymore. I am not sure if extending RealMatrixImpl is the right approach. Stuff like the following is a red flag. public double[][] getData() { throw new UnsupportedOperationException( "Not supported because sparse matrices are usually chosen when " + "the matrix to be modeled is too large to fit into memory, trying " + "to build the data array from this may result in an OutOfMemoryException."); } Also, it seems to override most (if not all) of the methods. Is it perhaps better to introduce an AbstractRealMatrix containing the shared methods and have both RealMatrixImpl and SparseRealMatrix extend it? I notice that this won't fix the getData() issue though, would it make sense to move that method into the dense matrix? (only possible if 2.0 can break compatibility). The following code in SparseRealMatrixImpl.setEntry does not need the containsKey call, remove can be called directly (avoiding an unnecessary look-up): // if an entry exists, remove it if (entries.containsKey(point)) { entries.remove(point); } There is another case of containsKey that is unnecessary because it's followed by a get after it, but once the specialised map exists the code will be even simpler so we can leave that for later.
          Hide
          ijuma Ismael Juma added a comment -

          An initial untested implementation of the IntPairToDoubleHashMap. It doesn't have javadoc and does not implement a custom Serialization mechanism yet. If Luc agrees that this class should be used, I will add some tests and polish it up.

          Show
          ijuma Ismael Juma added a comment - An initial untested implementation of the IntPairToDoubleHashMap. It doesn't have javadoc and does not implement a custom Serialization mechanism yet. If Luc agrees that this class should be used, I will add some tests and polish it up.
          Hide
          ijuma Ismael Juma added a comment -

          Instead of using an IntPairToDoubleHashMap, we can use a IntToDoubleHashMap and simply do compute (row * columnCount + column) to find the key. This means that we have one less int in each Entry object. With this change, it starts to look like an open addressed hash map might be better. I will work on that next.

          Show
          ijuma Ismael Juma added a comment - Instead of using an IntPairToDoubleHashMap, we can use a IntToDoubleHashMap and simply do compute (row * columnCount + column) to find the key. This means that we have one less int in each Entry object. With this change, it starts to look like an open addressed hash map might be better. I will work on that next.
          Hide
          luc Luc Maisonobe added a comment -

          I will try to fix MATH-231 soon if possible. This could help simplify the fix for this issue as well.
          I will let you polish your solution and check it afterwards.
          Please, don't assume I am the one who decides, here. I am just one developer among others in this community project.

          Show
          luc Luc Maisonobe added a comment - I will try to fix MATH-231 soon if possible. This could help simplify the fix for this issue as well. I will let you polish your solution and check it afterwards. Please, don't assume I am the one who decides, here. I am just one developer among others in this community project.
          Hide
          ijuma Ismael Juma added a comment -

          I wasn't aware of that ticket. Indeed that would simplify things. Thanks for looking into it.

          Show
          ijuma Ismael Juma added a comment - I wasn't aware of that ticket. Indeed that would simplify things. Thanks for looking into it.
          Hide
          luc Luc Maisonobe added a comment -

          issue MATH-231 has been fixed now.
          Could you update your patch to make it consistent with the new abstract class ?

          Show
          luc Luc Maisonobe added a comment - issue MATH-231 has been fixed now. Could you update your patch to make it consistent with the new abstract class ?
          Hide
          ijuma Ismael Juma added a comment -

          This attachment includes:

          • SparseRealMatrixImpl from Sujit (renamed to SparseRealMatrix).
          • Changes to SparseRealMatrix to use AbstractRealMatrix.
          • Changes to SparseRealMatrixTest to use LUSolver.
          • OpenIntToDoubleHashMap as a static inner class of SparseRealMatrix. The former is an open addressed hash map that does not create any wrapper objects. It uses a int[], double[] and byte[] to store the entries. The hashing scheme is based on Python's dict (a highly optimised hash map written in C because the runtime uses it for almost everything).
          • OpenIntToDoubleHashMapTest.
          • Changes to SparseRealMatrix to use OpenIntToDoubleHashMap.
          • MatrixUtils.createSparseRealMatrix from Sujit's patch.
          • Changes to a couple of check methods in AbstractRealMatrix so that they're protected final instead of private.

          There are a few things from Sujit's patch that this one does not include:

          • RealMatrixImpl changes. I don't think these are necessary anymore.
          • MatrixUtils.getRealMatrixInstance (not sure what this is for).
          • LUDecomposition changes.
          • In SparseRealMatrixImpl.setSubMatrix, there were some additional checks if entries.size() == 0. Are those still needed?

          Because of the missing LUDecomposition changes, some tests in SparseRealMatrixTest are failing. I am going to be away for a few days, so if Sujit or anyone else wants to update LUDecomposition to work with any matrix type, that would be good. On the other hand, perhaps those changes could be in a separate bug since adding the SparseRealMatrix is a decent amount of work on its own.

          Show
          ijuma Ismael Juma added a comment - This attachment includes: SparseRealMatrixImpl from Sujit (renamed to SparseRealMatrix). Changes to SparseRealMatrix to use AbstractRealMatrix. Changes to SparseRealMatrixTest to use LUSolver. OpenIntToDoubleHashMap as a static inner class of SparseRealMatrix. The former is an open addressed hash map that does not create any wrapper objects. It uses a int[], double[] and byte[] to store the entries. The hashing scheme is based on Python's dict (a highly optimised hash map written in C because the runtime uses it for almost everything). OpenIntToDoubleHashMapTest. Changes to SparseRealMatrix to use OpenIntToDoubleHashMap. MatrixUtils.createSparseRealMatrix from Sujit's patch. Changes to a couple of check methods in AbstractRealMatrix so that they're protected final instead of private. There are a few things from Sujit's patch that this one does not include: RealMatrixImpl changes. I don't think these are necessary anymore. MatrixUtils.getRealMatrixInstance (not sure what this is for). LUDecomposition changes. In SparseRealMatrixImpl.setSubMatrix, there were some additional checks if entries.size() == 0. Are those still needed? Because of the missing LUDecomposition changes, some tests in SparseRealMatrixTest are failing. I am going to be away for a few days, so if Sujit or anyone else wants to update LUDecomposition to work with any matrix type, that would be good. On the other hand, perhaps those changes could be in a separate bug since adding the SparseRealMatrix is a decent amount of work on its own.
          Hide
          ijuma Ismael Juma added a comment -

          Added a few more tests to increase coverage to 100% and add License header to one file.

          Show
          ijuma Ismael Juma added a comment - Added a few more tests to increase coverage to 100% and add License header to one file.
          Hide
          sujitpal Sujit Pal added a comment -

          Thanks for doing this Ismael – I will take a look at the patch tonight as well as the new code that Luc has put in for MATH-231 and update it. Thanks for MATH-231 Luc.

          Show
          sujitpal Sujit Pal added a comment - Thanks for doing this Ismael – I will take a look at the patch tonight as well as the new code that Luc has put in for MATH-231 and update it. Thanks for MATH-231 Luc.
          Hide
          luc Luc Maisonobe added a comment -

          I have reviewed the path, I am basically OK with it.
          The only thing that bothers me is the design decision to use a static inner class for OpenIntToDoubleHashMap. Since we already have a util package with a ResizableDoubleArray, I think it could be put there too and referenced from SparseRealMatrix. It doesn't seem to be too specialized as a strorage for matrices, and even uses a single integer as a key. So it could perhaps be made public.
          Bill Barker also said he would review the patch on the dev list. Let's wait for his comments.

          Show
          luc Luc Maisonobe added a comment - I have reviewed the path, I am basically OK with it. The only thing that bothers me is the design decision to use a static inner class for OpenIntToDoubleHashMap. Since we already have a util package with a ResizableDoubleArray, I think it could be put there too and referenced from SparseRealMatrix. It doesn't seem to be too specialized as a strorage for matrices, and even uses a single integer as a key. So it could perhaps be made public. Bill Barker also said he would review the patch on the dev list. Let's wait for his comments.
          Hide
          ijuma Ismael Juma added a comment -

          I don't have a strong opinion either way. It currently only implements the methods that are needed by SparseRealMatrix so if we were to move it to the util package, it would probably be a good idea to add at least two methods:

          • boolean containsKey(int) since map.get(key) == 0.0 only works if one never inserts 0.0 into the Map (which is the case for SparseRealMatrix).
          • int[] keys() that would return all the keys in the map allowing iteration.

          This would expose a basic set of operations. Convenience methods could then be added later if and when needed.

          Show
          ijuma Ismael Juma added a comment - I don't have a strong opinion either way. It currently only implements the methods that are needed by SparseRealMatrix so if we were to move it to the util package, it would probably be a good idea to add at least two methods: boolean containsKey(int) since map.get(key) == 0.0 only works if one never inserts 0.0 into the Map (which is the case for SparseRealMatrix). int[] keys() that would return all the keys in the map allowing iteration. This would expose a basic set of operations. Convenience methods could then be added later if and when needed.
          Hide
          sujitpal Sujit Pal added a comment -

          Apologies for not getting back earlier on this, I had a chance to look at the diff this morning. I will work on making the SparseMatrix tests pass this evening and post the patch tomorrow morning.

          However, looking at the patch, I notice that this method is used to find the key to the internal map for storing the entry:

          + private int computeKey(int row, int column)

          { + return row * columnDimension + column; + }

          This effectively flattens out the matrix so a matrix like this:
          1 2 3 4
          5 6 7 8
          9 10 11 12
          is flattened out to:
          1 2 3 4 5 6 7 8 9 10 11 12

          Now if you wanted to look for entry (1,2) you look for entry (1*4 + 2) = 6. So we will always get a unique key for a given matrix position, given that by specifying the row and column dimension we are always specifying a fixed size rectangular matrix.

          This is quite beautiful and clever (note to Ismael: Thanks for doing this, and I wish I had thought of this ).

          But the question is: do we still need an open-addressed map structure? It seems to me that we can now just represent the sparse matrix internally with a Map<Integer,Double>? That way we don't even have to think about whether we want to put it as an inner class or in utils.

          Thoughts?

          Show
          sujitpal Sujit Pal added a comment - Apologies for not getting back earlier on this, I had a chance to look at the diff this morning. I will work on making the SparseMatrix tests pass this evening and post the patch tomorrow morning. However, looking at the patch, I notice that this method is used to find the key to the internal map for storing the entry: + private int computeKey(int row, int column) { + return row * columnDimension + column; + } This effectively flattens out the matrix so a matrix like this: 1 2 3 4 5 6 7 8 9 10 11 12 is flattened out to: 1 2 3 4 5 6 7 8 9 10 11 12 Now if you wanted to look for entry (1,2) you look for entry (1*4 + 2) = 6. So we will always get a unique key for a given matrix position, given that by specifying the row and column dimension we are always specifying a fixed size rectangular matrix. This is quite beautiful and clever (note to Ismael: Thanks for doing this, and I wish I had thought of this ). But the question is: do we still need an open-addressed map structure? It seems to me that we can now just represent the sparse matrix internally with a Map<Integer,Double>? That way we don't even have to think about whether we want to put it as an inner class or in utils. Thoughts?
          Hide
          ijuma Ismael Juma added a comment -

          Yes, we still need an open addressed map if we want to have good space characteristics. A comparison of how much space each entry takes for each case follows (note that I ignore the things that take constant space independent of the number of entries like size and such):

          For HashMap<Integer,Double>, assuming a 32-bit JVM (would be much worse for the 64-bit one) and that the map has reached the maximum load factor (in bytes):

          • Map.Entry object = 4 (key reference) + 4 (value reference) + 4 (next reference) + 4 (cached int hash) + 8 (JVM mark and klass)[1] = 24 bytes
          • Integer object = 16 bytes[1]
          • Double object = 16 bytes (the reason why Integer and Double objects take the same space is due to alignment padding in Integer)
          • Space in Map.Entry[] = 4 bytes / 0.75 = 5.33 bytes
          • Total = 41.33 bytes

          For OpentIntToDoubleHashMap, assuming a 32-bit JVM (would be similar in 64-bit JVM since it doesn't rely on pointers in the non constant variables) that the map has reached the maximum load factor (in bytes):

          • Space in byte[] = 1 byte / 0.50 (load factor) = 2 bytes
          • Space in double[] = 8 bytes / 0.50 (load factor) = 16 bytes
          • Space in int[] = 4 bytes / 0.50 (load factor) = 8 bytes
          • Total = 26 bytes

          So, HashMap takes more than 50% more space for a 32-bit JVM and quite a bit more for the 64-bit one (since it's pointer heavy and alignment issues make it worse) and it's harder on the GC because all those Double, Integer and Map.Entry objects must be collected. Furthermore, the additional space doesn't even necessarily help with cache locality since Map.Entry only holds references to the actual objects (unfortunately, until we get value types, a.k.a. structs, we can't implement the ideal HashMap).

          [1] http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

          Show
          ijuma Ismael Juma added a comment - Yes, we still need an open addressed map if we want to have good space characteristics. A comparison of how much space each entry takes for each case follows (note that I ignore the things that take constant space independent of the number of entries like size and such): For HashMap<Integer,Double>, assuming a 32-bit JVM (would be much worse for the 64-bit one) and that the map has reached the maximum load factor (in bytes): Map.Entry object = 4 (key reference) + 4 (value reference) + 4 (next reference) + 4 (cached int hash) + 8 (JVM mark and klass) [1] = 24 bytes Integer object = 16 bytes [1] Double object = 16 bytes (the reason why Integer and Double objects take the same space is due to alignment padding in Integer) Space in Map.Entry[] = 4 bytes / 0.75 = 5.33 bytes Total = 41.33 bytes For OpentIntToDoubleHashMap, assuming a 32-bit JVM (would be similar in 64-bit JVM since it doesn't rely on pointers in the non constant variables) that the map has reached the maximum load factor (in bytes): Space in byte[] = 1 byte / 0.50 (load factor) = 2 bytes Space in double[] = 8 bytes / 0.50 (load factor) = 16 bytes Space in int[] = 4 bytes / 0.50 (load factor) = 8 bytes Total = 26 bytes So, HashMap takes more than 50% more space for a 32-bit JVM and quite a bit more for the 64-bit one (since it's pointer heavy and alignment issues make it worse) and it's harder on the GC because all those Double, Integer and Map.Entry objects must be collected. Furthermore, the additional space doesn't even necessarily help with cache locality since Map.Entry only holds references to the actual objects (unfortunately, until we get value types, a.k.a. structs, we can't implement the ideal HashMap). [1] http://blogs.sun.com/jrose/entry/fixnums_in_the_vm
          Hide
          ijuma Ismael Juma added a comment -

          I should add that I was a bit conservative in the load factor of OpenIntHashMap. We can reduce the space of the OpenIntToDoubleHashMap to 19.69 bytes by using a load factor of 0.66 (which is the default for the python dict).

          Show
          ijuma Ismael Juma added a comment - I should add that I was a bit conservative in the load factor of OpenIntHashMap. We can reduce the space of the OpenIntToDoubleHashMap to 19.69 bytes by using a load factor of 0.66 (which is the default for the python dict).
          Hide
          sujitpal Sujit Pal added a comment -

          Ok, thanks, now understand more fully the justification for the OpenIntHashMap, thanks for the lucid explanation.

          I am uploading the following attachments:
          1) SparseMatrixImpl.java - extends AbstractRealMatrixImpl and uses the get/setEntry methods provided. Uses a static inner class OpenIntHashMap for data storage.
          2) SparseMatrixImplTest.java - all tests run except for the ones involving LUDecompositionImpl - they are testInverse() and testExamples(), commented out. The reason seems to be that LUDecompositionImpl wants the double[][] data array to work with, which is available only from a RealMatrixImpl.
          It seems to be simple enough to replace the various lu[i][j] calls with a local RealMatrix.get/setEntry() call, something like this:
          // lu = realMatrix.data();
          RealMatrix rm = realMatrix.copy();
          ...
          // lu[i][j] = v;
          rm.setEntry(i,j,v);
          ... etc.
          But when I did this locally, I started getting errors about non singularity, and quite a few other tests began to fail, so I put the LUDecompositionImpl back to the svn version.
          It may be better, as Ismael pointed out, to remove the getData() method out of the RealMatrix interface and move it to the RealMatrixImpl, since getData() probably makes sense for dense matrices.
          3) Changed AbstractRealMatrix.toString() to reflect the class name of the extending class. Changed visibility of checkRowIndex and checkColumnIndex from private to protected to allow usage within SparseMatrixImpl.

          I think the next step would be to make LUDecompositionImpl work without RealMatrix.getData() and use get/setEntry() methods instead.

          Show
          sujitpal Sujit Pal added a comment - Ok, thanks, now understand more fully the justification for the OpenIntHashMap, thanks for the lucid explanation. I am uploading the following attachments: 1) SparseMatrixImpl.java - extends AbstractRealMatrixImpl and uses the get/setEntry methods provided. Uses a static inner class OpenIntHashMap for data storage. 2) SparseMatrixImplTest.java - all tests run except for the ones involving LUDecompositionImpl - they are testInverse() and testExamples(), commented out. The reason seems to be that LUDecompositionImpl wants the double[][] data array to work with, which is available only from a RealMatrixImpl. It seems to be simple enough to replace the various lu [i] [j] calls with a local RealMatrix.get/setEntry() call, something like this: // lu = realMatrix.data(); RealMatrix rm = realMatrix.copy(); ... // lu [i] [j] = v; rm.setEntry(i,j,v); ... etc. But when I did this locally, I started getting errors about non singularity, and quite a few other tests began to fail, so I put the LUDecompositionImpl back to the svn version. It may be better, as Ismael pointed out, to remove the getData() method out of the RealMatrix interface and move it to the RealMatrixImpl, since getData() probably makes sense for dense matrices. 3) Changed AbstractRealMatrix.toString() to reflect the class name of the extending class. Changed visibility of checkRowIndex and checkColumnIndex from private to protected to allow usage within SparseMatrixImpl. I think the next step would be to make LUDecompositionImpl work without RealMatrix.getData() and use get/setEntry() methods instead.
          Hide
          sujitpal Sujit Pal added a comment -

          New SparseRealMatrix impl

          Show
          sujitpal Sujit Pal added a comment - New SparseRealMatrix impl
          Hide
          sujitpal Sujit Pal added a comment -

          test case with two tests involving LUDecompositionImpl commented out.

          Show
          sujitpal Sujit Pal added a comment - test case with two tests involving LUDecompositionImpl commented out.
          Hide
          sujitpal Sujit Pal added a comment -

          patch for minor changes to AbstractRealMatrixImpl

          Show
          sujitpal Sujit Pal added a comment - patch for minor changes to AbstractRealMatrixImpl
          Hide
          luc Luc Maisonobe added a comment -

          I've started applying the patch with slight changes (like curly braces, javadoc).
          The most important change was to extract the map class and put it into the util packages, adding the containsKey() and keys() methods as suggested.

          I have random test failures occurring in testRemove2(). The failures are triggered by the assertEquals(mapSize, map.size()) in the assertPutAndGet mathod called at the end of testRemove2(). mapSize was 2009 while map.size was 2010. I have succeded once (and only once) to trigger the failure under debugger. In that case, the key was -1035761067. It appeared in the map both at index 627 with status FULL and value 0.9939107277401076 and at index 1363 with status FREE and value 0.0. In the javaMap it was associated with value 0.21972202724338397. The load factor was 0.5 so array size was 4096 I think.

          Trying to reuse the values above in a dedicated new test with various combinations of put and remove didn't reproduce the bug.

          I don't understand what happens yet and since it appears randomly (due to the use of java.util.Random within the test), it is quite difficult to analyze.

          Show
          luc Luc Maisonobe added a comment - I've started applying the patch with slight changes (like curly braces, javadoc). The most important change was to extract the map class and put it into the util packages, adding the containsKey() and keys() methods as suggested. I have random test failures occurring in testRemove2(). The failures are triggered by the assertEquals(mapSize, map.size()) in the assertPutAndGet mathod called at the end of testRemove2(). mapSize was 2009 while map.size was 2010. I have succeded once (and only once) to trigger the failure under debugger. In that case, the key was -1035761067. It appeared in the map both at index 627 with status FULL and value 0.9939107277401076 and at index 1363 with status FREE and value 0.0. In the javaMap it was associated with value 0.21972202724338397. The load factor was 0.5 so array size was 4096 I think. Trying to reuse the values above in a dedicated new test with various combinations of put and remove didn't reproduce the bug. I don't understand what happens yet and since it appears randomly (due to the use of java.util.Random within the test), it is quite difficult to analyze.
          Hide
          ijuma Ismael Juma added a comment -

          Interesting Luc. I will try to reproduce this. Thanks for the additional information.

          Show
          ijuma Ismael Juma added a comment - Interesting Luc. I will try to reproduce this. Thanks for the additional information.
          Hide
          ijuma Ismael Juma added a comment -

          After running the test suite a few times, I also got a failure. I should be able to create a test case that does not involve randomness by adding some code to track puts, gets and removes. By the way, the reason why the initial tests involved random numbers is precisely for their ability to find weird corner cases like this (similar to QuickCheck and ScalaCheck, but done manually). They are not the best way to narrow down the problem though.

          Show
          ijuma Ismael Juma added a comment - After running the test suite a few times, I also got a failure. I should be able to create a test case that does not involve randomness by adding some code to track puts, gets and removes. By the way, the reason why the initial tests involved random numbers is precisely for their ability to find weird corner cases like this (similar to QuickCheck and ScalaCheck, but done manually). They are not the best way to narrow down the problem though.
          Hide
          ijuma Ismael Juma added a comment - - edited

          That was a tricky one. Luc, since you've done some changes to the code, it might be easier for me to just paste the methods that have changed.

          In OpenIntToDoubleHashMapTest, please add:

              /**
               * Regression test for a bug in findInsertionIndex where the hashing in the second probing
               * loop was inconsistent with the first causing duplicate keys after the right sequence
               * of puts and removes.
               */
              public void testPutKeysWithCollisions() {
                  OpenIntToDoubleHashMap map = new OpenIntToDoubleHashMap();
                  int key1 = -1996012590;
                  double value1 = 1.0;
                  map.put(key1, value1);
                  int key2 = 835099822;
                  map.put(key2, value1);
                  int key3 = 1008859686;
                  map.put(key3, value1);
                  assertEquals(value1, map.get(key3));
                  assertEquals(3, map.size());
                  
                  map.remove(key2);
                  double value2 = 2.0;
                  map.put(key3, value2);
                  assertEquals(value2, map.get(key3));
                  assertEquals(2, map.size());
              }
              
              /**
               * Similar to testPutKeysWithCollisions() but exercises the codepaths in a slightly
               * different manner.
               */
              public void testPutKeysWithCollision2() {
                  OpenIntToDoubleHashMap map = new OpenIntToDoubleHashMap();
                  int key1 = 837989881;
                  double value1 = 1.0;
                  map.put(key1, value1);
                  int key2 = 476463321;
                  map.put(key2, value1);
                  assertEquals(2, map.size());
                  assertEquals(value1, map.get(key2));
                  
                  map.remove(key1);
                  double value2 = 2.0;
                  map.put(key2, value2);
                  assertEquals(1, map.size());
                  assertEquals(value2, map.get(key2));
              }
          

          and also update setUp to be like (I added some negative keys just in case):

              @Override
              protected void setUp() throws Exception {
                  javaMap.put(50, 100.0);
                  javaMap.put(75, 75.0);
                  javaMap.put(25, 500.0);
                  javaMap.put(Integer.MAX_VALUE, Double.MAX_VALUE);
                  javaMap.put(0, -1.0);
                  javaMap.put(1, 0.0);
                  javaMap.put(33, -0.1);
                  javaMap.put(23234234, -242343.0);
                  javaMap.put(23321, Double.MIN_VALUE);
                  javaMap.put(-4444, 332.0);
                  javaMap.put(-1, -2323.0);
                  javaMap.put(Integer.MIN_VALUE, 44.0);
          
                  /* Add a few more to cause the table to rehash */
                  javaMap.putAll(generate());
          
              }
          

          Run the tests and they should show at least one failure (I am not sure if the second collisions test will fail with the version you have, but I added it to verify a potential bug that I visually identified while modifying findInsertionIndex).

          Then change findInsertionIndex to be like:

                  private static int findInsertionIndex(int[] keys, byte[] states,
                          int key, int mask) {
                      int hash = hashOf(key);
                      int index = hash & mask;
                      if (states[index] == FREE)
                          return index;
                      else if (states[index] == FULL && keys[index] == key)
                          return changeIndexSign(index);
          
                      int perturb = perturb(hash);
                      int j = index;
                      if (states[index] == FULL) {
                          while (true) {
                              j = probe(perturb, j);
                              index = j & mask;
                              perturb >>= PERTURB_SHIFT;
                              
                              if (states[index] != FULL || keys[index] == key)
                                  break;
                          }
                      }
                      if (states[index] == FREE)
                          return index;
                      /*
                       * Due to the loop exit condition, if (states[index] == FULL) then
                       * keys[index] == key
                       */
                      else if (states[index] == FULL)
                          return changeIndexSign(index);
          
                      int firstRemoved = index;
                      while (true) {
                          j = probe(perturb, j);
                          index = j & mask;
          
                          if (states[index] == FREE)
                              return firstRemoved;
                          else if (states[index] == FULL && keys[index] == key)
                              return changeIndexSign(index);
          
                          perturb >>= PERTURB_SHIFT;
                      }
                  }
          
          

          All tests should pass after that. Nice catch by the way, it was a nasty bug.

          Show
          ijuma Ismael Juma added a comment - - edited That was a tricky one. Luc, since you've done some changes to the code, it might be easier for me to just paste the methods that have changed. In OpenIntToDoubleHashMapTest, please add: /** * Regression test for a bug in findInsertionIndex where the hashing in the second probing * loop was inconsistent with the first causing duplicate keys after the right sequence * of puts and removes. */ public void testPutKeysWithCollisions() { OpenIntToDoubleHashMap map = new OpenIntToDoubleHashMap(); int key1 = -1996012590; double value1 = 1.0; map.put(key1, value1); int key2 = 835099822; map.put(key2, value1); int key3 = 1008859686; map.put(key3, value1); assertEquals(value1, map.get(key3)); assertEquals(3, map.size()); map.remove(key2); double value2 = 2.0; map.put(key3, value2); assertEquals(value2, map.get(key3)); assertEquals(2, map.size()); } /** * Similar to testPutKeysWithCollisions() but exercises the codepaths in a slightly * different manner. */ public void testPutKeysWithCollision2() { OpenIntToDoubleHashMap map = new OpenIntToDoubleHashMap(); int key1 = 837989881; double value1 = 1.0; map.put(key1, value1); int key2 = 476463321; map.put(key2, value1); assertEquals(2, map.size()); assertEquals(value1, map.get(key2)); map.remove(key1); double value2 = 2.0; map.put(key2, value2); assertEquals(1, map.size()); assertEquals(value2, map.get(key2)); } and also update setUp to be like (I added some negative keys just in case): @Override protected void setUp() throws Exception { javaMap.put(50, 100.0); javaMap.put(75, 75.0); javaMap.put(25, 500.0); javaMap.put( Integer .MAX_VALUE, Double .MAX_VALUE); javaMap.put(0, -1.0); javaMap.put(1, 0.0); javaMap.put(33, -0.1); javaMap.put(23234234, -242343.0); javaMap.put(23321, Double .MIN_VALUE); javaMap.put(-4444, 332.0); javaMap.put(-1, -2323.0); javaMap.put( Integer .MIN_VALUE, 44.0); /* Add a few more to cause the table to rehash */ javaMap.putAll(generate()); } Run the tests and they should show at least one failure (I am not sure if the second collisions test will fail with the version you have, but I added it to verify a potential bug that I visually identified while modifying findInsertionIndex). Then change findInsertionIndex to be like: private static int findInsertionIndex( int [] keys, byte [] states, int key, int mask) { int hash = hashOf(key); int index = hash & mask; if (states[index] == FREE) return index; else if (states[index] == FULL && keys[index] == key) return changeIndexSign(index); int perturb = perturb(hash); int j = index; if (states[index] == FULL) { while ( true ) { j = probe(perturb, j); index = j & mask; perturb >>= PERTURB_SHIFT; if (states[index] != FULL || keys[index] == key) break ; } } if (states[index] == FREE) return index; /* * Due to the loop exit condition, if (states[index] == FULL) then * keys[index] == key */ else if (states[index] == FULL) return changeIndexSign(index); int firstRemoved = index; while ( true ) { j = probe(perturb, j); index = j & mask; if (states[index] == FREE) return firstRemoved; else if (states[index] == FULL && keys[index] == key) return changeIndexSign(index); perturb >>= PERTURB_SHIFT; } } All tests should pass after that. Nice catch by the way, it was a nasty bug.
          Hide
          luc Luc Maisonobe added a comment -

          fixed in trunk as of r726460

          The map class has been put in the util packages. Iteration support has been added.
          Two specialized implementations of add and subtract have been added to benefit from small number of elements when both matrices are sparse: we iterate only on the non-null elements.

          Show
          luc Luc Maisonobe added a comment - fixed in trunk as of r726460 The map class has been put in the util packages. Iteration support has been added. Two specialized implementations of add and subtract have been added to benefit from small number of elements when both matrices are sparse: we iterate only on the non-null elements.
          Hide
          ijuma Ismael Juma added a comment - - edited

          Thanks for committing this Luc. Good point about specialised implementations of add and subtract. However, the current implementation has a lot of overhead due to the allocation of Entry objects. Since the iterator doesn't implement java.util.Iterator anyway, we can do better by using an iterator similar to Trove's. I will paste part of the javadoc below:

          /**
           * Iterator for maps of type int and double.
           *
           * <p>The iterator semantics for Trove's primitive maps is slightly different
           * from those defined in <tt>java.util.Iterator</tt>, but still well within
           * the scope of the pattern, as defined by Gamma, et al.</p>
           *
           * <p>This iterator does <b>not</b> implicitly advance to the next entry when
           * the value at the current position is retrieved.  Rather, you must explicitly
           * ask the iterator to <tt>advance()</tt> and then retrieve either the <tt>key()</tt>,
           * the <tt>value()</tt> or both.  This is done so that you have the option, but not
           * the obligation, to retrieve keys and/or values as your application requires, and
           * without introducing wrapper objects that would carry both.  As the iteration is
           * stateful, access to the key/value parts of the current map entry happens in
           * constant time.</p>
           *
           * <p>In practice, the iterator is akin to a "search finger" that you move from
           * position to position.  Read or write operations affect the current entry only and
           * do not assume responsibility for moving the finger.</p>
           *
           * <p>Here are some sample scenarios for this class of iterator:</p>
           *
           * <pre>
           * // accessing keys/values through an iterator:
           * for (TIntDoubleIterator it = map.iterator();
           *      it.hasNext();) {
           *   it.advance();
           *   if (satisfiesCondition(it.key()) {
           *     doSomethingWithValue(it.value());
           *   }
           * }
           * (snipped)
           */
          

          Shall I file a separate issue for that?

          Show
          ijuma Ismael Juma added a comment - - edited Thanks for committing this Luc. Good point about specialised implementations of add and subtract. However, the current implementation has a lot of overhead due to the allocation of Entry objects. Since the iterator doesn't implement java.util.Iterator anyway, we can do better by using an iterator similar to Trove's. I will paste part of the javadoc below: /** * Iterator for maps of type int and double . * * <p>The iterator semantics for Trove's primitive maps is slightly different * from those defined in <tt>java.util.Iterator</tt>, but still well within * the scope of the pattern, as defined by Gamma, et al.</p> * * <p>This iterator does <b>not</b> implicitly advance to the next entry when * the value at the current position is retrieved. Rather, you must explicitly * ask the iterator to <tt>advance()</tt> and then retrieve either the <tt>key()</tt>, * the <tt>value()</tt> or both. This is done so that you have the option, but not * the obligation, to retrieve keys and/or values as your application requires, and * without introducing wrapper objects that would carry both. As the iteration is * stateful, access to the key/value parts of the current map entry happens in * constant time.</p> * * <p>In practice, the iterator is akin to a "search finger" that you move from * position to position. Read or write operations affect the current entry only and * do not assume responsibility for moving the finger.</p> * * <p>Here are some sample scenarios for this class of iterator:</p> * * <pre> * // accessing keys/values through an iterator: * for (TIntDoubleIterator it = map.iterator(); * it.hasNext();) { * it.advance(); * if (satisfiesCondition(it.key()) { * doSomethingWithValue(it.value()); * } * } * (snipped) */ Shall I file a separate issue for that?
          Hide
          luc Luc Maisonobe added a comment -

          No need to file another issue. I have fixed this as of r726490.
          Hope this is better now.

          Show
          luc Luc Maisonobe added a comment - No need to file another issue. I have fixed this as of r726490. Hope this is better now.
          Hide
          ijuma Ismael Juma added a comment -

          Thanks, that is indeed what I meant by the API. I didn't have a chance to look at the implementation in detail, but seems right.

          Show
          ijuma Ismael Juma added a comment - Thanks, that is indeed what I meant by the API. I didn't have a chance to look at the implementation in detail, but seems right.

            People

            • Assignee:
              luc Luc Maisonobe
              Reporter:
              sujitpal Sujit Pal
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development