Refactoring the vector and matrix classes

XMLWordPrintableJSON

Details

• Improvement
• Status: Open
• Major
• Resolution: Unresolved
• 3.3
• None

Description

Warning

This is not a bug report, but rather a summary of all discussions which have taken place on the mailing list regarding the refactoring of the vector and matrix classes. Indeed, it has been argued many times that the RealVector and RealMatrix interfaces are really cluttered, and could benefit from other approaches (like functional programming).

The description of this ticket will be updated as the discussion progresses on the mailing-list, and new JIRA tickets will be created to carry out the "real" work. In order to keep this ticket tidy, contributors should refrain from commenting on this website. Instead, messages should be posted on the dev mailing-list.

The current API (version 3.0)

In this section, the current interfaces for vectors and matrices are compared. Vectors and matrices are two mathematical objects which are very close in nature. Their implementations should therefore be as similar as possible. The methods will be sorted as follows

• methods reflecting the mathematical structure of vector space: addition, multiplication by a scalar, matrix-vector product, ...
• methods reflecting the mathematical structure of euclidean space
• ...

Methods reflecting the mathematical structure of vector space

List of the methods

RealVector subtract(RealVector v) RealMatrix subtract(RealMatrix m)
int getDimension() int getRowDimension(),
int getColumnDimension()

RealVector mapMultiply(double d) scalarMultiply(double d) (1)
RealVector mapMultiplyToSelf(double d)
RealVector outerProduct(RealVector v)
double getTrace()
multiply(RealMatrix m)
double[] operate(double[]) (2)
RealVector operate(RealVector)
RealMatrix power(int p)
double[] preMultiply(double[]) (2)
RealMatrix preMultiply(RealMatrix)
RealVector preMultiply(RealVector)
RealMatrix transpose()

Comment (1)

RealVector RealVector.mapMultiply(double) and RealMatrix RealMatrix.scalarMultiply(double) perform essentially the same task. Readibility of the classes would be improved if they add the same name. This is very important since these methods reflect the fact that the space of vectors as well as the space of matrices are two vector spaces.

Comment (2)

Prior to the release of version 3.0, all methods taking as argument, or returning, double[] as a representation of vectors were removed. The rationale for this is that calling new ArrayRealVector(double[], false) is very easy, and comes at virtually no cost (see MATH-653 and MATH-660). It might be worth considering the same simplification for the RealMatrix interface.

Methods reflecting the mathematical structure of euclidean space

List of the methods

double cosine(RealVector v)
double dotProduct(RealVector v)   (3)
double getDistance(RealVector v)
double getNorm() double getFrobeniusNorm()
RealVector projection(RealVector v)
void unitize()   (4)
RealVector unitVector()

Comment (3)

In a way, RealMatrix RealMatrix.transpose() could be seen as a method inherent to the euclidean structure, and the generalization of the dot product. For this reason, transpose() should probably not be externalized.

Comment (4)

This could be externalized with the visitor pattern (see below).

Comment (5)

Could be externalized in a factory class.

Constructors, factory methods and related methods

List of the methods

RealVector append(double d)
RealVector append(RealVector v)
RealVector copy() RealMatrix copy()
void copySubMatrix(int[] selectedRows, int[] selectedColumns, double[][] destination) (6), (7)
void copySubMatrix(int startRow, int endRow, int startColumn, int endColumn, double[][] destination) (7)
createMatrix(int rowDimension, int columnDimension) (9)
double[] getColumn(int column) (6)
RealMatrix getColumnMatrix(int column)
RealVector getColumnVector(int column)
double[] getRow(int row) (6)
RealMatrix getRowMatrix(int row)
RealVector getRowVector(int row)
RealMatrix getSubMatrix(int[] selectedRows, int[] selectedColumns)
RealVector getSubVector(int index, int n) RealMatrix getSubMatrix(int startRow, int endRow, int startColumn, int endColumn) (8)
void setColumn(int column, double[] array) (6)
void setColumnMatrix(int column, RealMatrix matrix)
void setColumnVector(int column, RealVector vector)
void setRow(int row, double[] array) (6)
void setRowMatrix(int row, RealMatrix matrix)
void setRowVector(int row, RealVector vector)
void setSubVector(int index, RealVector v) void setSubMatrix(double[][] subMatrix, int row, int column) (6)
unmodifiableRealVector(RealVector v)

Comment (6)

Prior to the release of version 3.0, all methods taking as argument, or returning, double[] as a representation of vectors were removed. The rationale for this is that calling new ArrayRealVector(double[], false) is very easy, and comes at virtually no cost (see MATH-653 and MATH-660). It might be worth considering the same simplification for the RealMatrix interface.

Comment (7)

The signature of this method is rather unusual in Commons-Math, as one of the parameters is modified, and nothing is returned.

Comment (8)

Inconsistency: in getSubVector(int, int), the second parameter is the number of entries to be copied, while in getSubMatrix(int, int, int, int) the second (resp. fourth) parameters are the indices of the last row (resp. column).

Comment (9)

This is a very useful method: one often needs to create a new vector/matrix with same data layout as an existing vector/matrix. This method should probably be generalized to RealVector as well.

Manipulation of entries

List of the methods

double getEntry(int index) double getEntry(int row, int column)
void set(double value)   (10)
void setEntry(int index, double value) void setEntry(int row, int column, double value)
void addToEntry(int index, double increment) void addToEntry(int row, int column, double increment) (11)
void multiplyEntry(int row, int column, double factor) (11)

Comment (10)

This useful method could be extended to RealMatrix as well. However, it could be argued that this method is superfluous, as the visitor pattern (or indeed the mapToSelf() method) would do in this case.

Comment (11)

These are typical examples of useful methods which can lead to uncontrolled growth of the APIs.

Various norms and related methods

List of the methods

double getL1Distance(RealVector v)   (13)
double getL1Norm()   (12)
double getLInfDistance(RealVector v)   (13)
double getLInfNorm() double getNorm() (12), (14)
int getMaxIndex()   (12), (15)
double getMaxValue()   (12), (15)
int getMinIndex()   (12), (15)
int getMinValue()   (12), (15)

Comment (12)

This method could be removed from the RealVector API. Visitors should be implemented instead.

Comment (13)

This method could be removed from the RealVector API. Alternative functional approach should be proposed (e.g. zip).

Comment (14)

Inconsistencies: getNorm() does not refer to the same type of norm.

Comment (15)

Methods int getXXXIndex() and double getXXXValue() could be merged, the returned type might be a Pair<Integer, Double>. This would avoid inefficient duplicate sweep of the vector if both the value and the index are needed.

Functional-programming-like methods

List of the methods

The methods listed below are both "truly" FP methods, as well as methods which could benefit from a FP approach.

Iterator<RealVector.Entry> iterator()
Iterator<RealVector.Entry> sparseIterator()
RealVector map(UnivariateFunction function)   (18)
RealVector mapToSelf(UnivariateFunction function)   (18)
RealVector mapDivide(double d)   (16)
RealVector mapDivideToSelf(double d)   (16)
RealVector mapSubtract(double d)   (16)
RealVector mapSubtractToSelf(double d)   (16)
RealVector ebeDivide(RealVector v)   (17)
RealVector ebeMultiply(RealVector v)   (17)
RealVector combine(double a, double b, RealVector y)   (17)
RealVector combineToSelf(double a, double b, RealVector y)   (17)
double walkInColumnOrder(RealMatrixChangingVisitor visitor) (18)
double walkInColumnOrder(RealMatrixChangingVisitor visitor, int startRow, int endRow, int startColumn, int endColumn) (18)
double walkInColumnOrder(RealMatrixPreservingVisitor visitor) (18)
double walkInColumnOrder(RealMatrixPreservingVisitor visitor, int startRow, int endRow, int startColumn, int endColumn) (18)
double walkInOptimizedOrder(RealMatrixChangingVisitor visitor) (18)
double walkInOptimizedOrder(RealMatrixChangingVisitor visitor, int startRow, int endRow, int startColumn, int endColumn) (18)
double walkInOptimizedOrder(RealMatrixPreservingVisitor visitor) (18)
double walkInOptimizedOrder(RealMatrixPreservingVisitor visitor, int startRow, int endRow, int startColumn, int endColumn) (18)
double walkInRowOrder(RealMatrixChangingVisitor visitor) (18)
double walkInRowOrder(RealMatrixChangingVisitor visitor, int startRow, int endRow, int startColumn, int endColumn) (18)
double walkInRowOrder(RealMatrixPreservingVisitor visitor) (18)
double walkInRowOrder(RealMatrixPreservingVisitor visitor, int startRow, int endRow, int startColumn, int endColumn) (18)

Comment (16)

These methods could be removed, as a call to map()mapToSelf().

Comment (17)

These methods could benefit from a FP approach (e.g. zip).

Comment (18)

map() and visit() are similar concepts, which are both useful. Should map() be implemented for RealMatrix? Should visit() be implemented for RealVector?

Data conversion methods

List of the methods

double[] toArray() double[][] getData()

People

Unassigned
Sebastien Brisard