The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
What I suggest is an API of this form:
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.