[MAHOUT-180] port Hadoop-ified Lanczos SVD implementation from decomposer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.2
Fix Version/s: 0.3
Component/s: classic
Labels:
None

Description

I wrote up a hadoop version of the Lanczos algorithm for performing SVD on sparse matrices available at http://decomposer.googlecode.com/, which is Apache-licensed, and I'm willing to donate it. I'll have to port over the implementation to use Mahout vectors, or else add in these vectors as well.

Current issues with the decomposer implementation include: if your matrix is really big, you need to re-normalize before decomposition: find the largest eigenvalue first, and divide all your rows by that value, then decompose, or else you'll blow over Double.MAX_VALUE once you've run too many iterations (the L^2 norm of intermediate vectors grows roughly as (largest-eigenvalue)^(num-eigenvalues-found-so-far), so losing precision on the lower end is better than blowing over MAX_VALUE). When this is ported to Mahout, we should add in the capability to do this automatically (run a couple iterations to find the largest eigenvalue, save that, then iterate while scaling vectors by 1/max_eigenvalue).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAHOUT-180.patch
20/Jan/10 03:31
52 kB
Jake Mannix
MAHOUT-180.patch
20/Jan/10 09:01
66 kB
Jake Mannix
MAHOUT-180.patch
09/Feb/10 20:29
48 kB
Jake Mannix
MAHOUT-180.patch
10/Feb/10 09:47
63 kB
Jake Mannix
MAHOUT-180.patch
20/Feb/10 07:27
66 kB
Jake Mannix

Issue Links

is related to

MAHOUT-76 Singular Value Decomposition for SparseMatrix / DenseMatrix

Closed

Activity

People

Assignee:: Jake Mannix

Reporter:: Jake Mannix

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Sep/09 20:01

Updated:: 31/Jan/24 22:16

Resolved:: 20/Feb/10 15:47