I don't really know anything about the way that SKMD works, so all I can weight in is what's going on in Lanczos:
You take an input matrix with some number of rows (this number doesn't matter, doesn't show up anywhere) and numCols columns (this number matters a lot). You want desiredRank eigenvectors to pop out in the end. So you start with some initial basisVector (number 0), and you iterate again and again taking your input corpus.timesSquared(basisIminusOne) (resultant vector is of size numCols), do some orthogonalization against previous vectors, hang onto this vector.
Eventually you have desiredRank basisVectors, arranged in the LanczosState object in a Map<Integer,Vector> (it could be a Matrix, certainly, it is, but we're just hanging onto it before building a matrix soon enough). Meanwhile, we're building up a desiredRank x desiredRank tri-diagonal (ie very sparse) matrix using these basis vectors and their inner products.
Now we ask COLT to get the eigenvectors and eigenvalues of the tridiagonal matrix, there will be desiredRank eigenvalues, and desiredRank eigenVectors (each of dimension desiredRank).
Here we get to where you're getting an NPE. We walk along the desiredRank^2 values in the eigenvector matrix ("eigenVects"), and for each of 0... desiredRank, we grab the basisVector (we have desiredRank of them, each of size numCols) and add a linear multiple of it onto something which will be the final eigenvector we'll return at the end of the day.
What is SKMD doing?
LanczosState state = new LanczosState(L, overshoot, numDims, solver.getInitialVector(L));
Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" + (System.nanoTime() & 0xFF));
We're making a LanczosState with specifying numCols = overshoot, desiredRank = numDims.
Then we run the solver with desiredRank = overshoot.
Looks like this is inconsistent, the desiredRank should be the same?