In method tallSkinnyQR, the final Q is calculated by A * inv(R) (Github Link). When the upper triangular matrix R is ill-conditioned, computing the inverse of R can result in catastrophic cancellation. Instead, we should consider using a forward solve for solving Q such that Q * R = A.

I first create a 4 by 4 RowMatrix A = (1,1,1,1;0,1E-5,0,0;0,0,1E-10,1;0,0,0,1E-14), and then I apply method tallSkinnyQR to A to find RowMatrix Q and Matrix R such that A = Q*R. In this case, A is ill-conditioned and so is R.

See codes in Spark Shell:

import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors} import org.apache.spark.mllib.linalg.distributed.RowMatrix // Create RowMatrix A. val mat = Seq(Vectors.dense(1,1,1,1), Vectors.dense(0, 1E-5, 1,1), Vectors.dense(0,0,1E-10,1), Vectors.dense(0,0,0,1E-14)) val denseMat = new RowMatrix(sc.parallelize(mat, 2)) // Apply tallSkinnyQR to A. val result = denseMat.tallSkinnyQR(true) // Print the calculated Q and R. result.Q.rows.collect.foreach(println) result.R // Calculate Q*R. Ideally, this should be close to A. val reconstruct = result.Q.multiply(result.R) reconstruct.rows.collect.foreach(println) // Calculate Q'*Q. Ideally, this should be close to the identity matrix. result.Q.computeGramianMatrix() System.exit(0)

it will output the following results:

scala> result.Q.rows.collect.foreach(println) [1.0,0.0,0.0,1.5416524685312E13] [0.0,0.9999999999999999,0.0,8011776.0] [0.0,0.0,1.0,0.0] [0.0,0.0,0.0,1.0] scala> result.R 1.0 1.0 1.0 1.0 0.0 1.0E-5 1.0 1.0 0.0 0.0 1.0E-10 1.0 0.0 0.0 0.0 1.0E-14 scala> reconstruct.rows.collect.foreach(println) [1.0,1.0,1.0,1.15416524685312] [0.0,9.999999999999999E-6,0.9999999999999999,1.00000008011776] [0.0,0.0,1.0E-10,1.0] [0.0,0.0,0.0,1.0E-14] scala> result.Q.computeGramianMatrix() 1.0 0.0 0.0 1.5416524685312E13 0.0 0.9999999999999998 0.0 8011775.999999999 0.0 0.0 1.0 0.0 1.5416524685312E13 8011775.999999999 0.0 2.3766923337289844E26

With forward solve for solving Q such that Q * R = A rather than computing the inverse of R, it will output the following results instead:

scala> result.Q.rows.collect.foreach(println) [1.0,0.0,0.0,0.0] [0.0,1.0,0.0,0.0] [0.0,0.0,1.0,0.0] [0.0,0.0,0.0,1.0] scala> result.R 1.0 1.0 1.0 1.0 0.0 1.0E-5 1.0 1.0 0.0 0.0 1.0E-10 1.0 0.0 0.0 0.0 1.0E-14 scala> reconstruct.rows.collect.foreach(println) [1.0,1.0,1.0,1.0] [0.0,1.0E-5,1.0,1.0] [0.0,0.0,1.0E-10,1.0] [0.0,0.0,0.0,1.0E-14] scala> result.Q.computeGramianMatrix() 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0