[MAHOUT-309] Implement Stochastic Decomposition - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.4
Fix Version/s: 0.5
Component/s: classic
Labels:
None

Description

Techniques reviewed in <a href="http://arxiv.org/abs/0909.4061">Halko, Martinsson, and Tropp</a>.

The basic idea of the implementation is as follows: if the input matrix is represented as a DistributedSparseRowMatrix (backed by a sequence-file of <Writable,VectorWritable> - the values of which should be SequentialAccessSparseVector instances for best performance), and you optionally have a kernel function f(v) which maps sparse numColumns-dimensional (here numColumns is unconstrained in size) vectors to sparse numKernelizedFeatures-dimensional (also unconstrained in size) vectors (in the case where you want to do kernel-PCA, for example, for a kernel k(u,v) = f(u).dot( f(v) )), then take the MurmurHash (from ~~MAHOUT-228~~) and maps the numKernelizedFeatures-dimensional vectors and projects down to some numHashedFeatures-dimensional space (reasonably-sized - no more than a 10^2 to 10^4).

This is all done in the Mapper, and there are two outputs: the numHashedFeatures-dimensional vector itself (if the left-singular vectors are ever desired), which does not need to be Reduced, and the outer-product of this vector with itself, where the Reducer/Combiner just does the matrix sum on the partial outputs, eventually producing the kernel / gram matrix of your hashed features, which can then be run through a simple eigen-decomposition, the ((1/eigenvalue)-scaled) eigenvectors of which can be applied to project the (optional) numHashedFeatures-dimensional outputs mentioned earlier in this paragraph to get the left-singular vectors / reduced projections (which can be then run through clustering, etc...).

Good fun will be had by all.

Attachments

Issue Links

is blocked by

MAHOUT-228 Need sequential logistic regression implementation using SGD techniques

Closed

is duplicated by

MAHOUT-376 Implement Map-reduce version of stochastic SVD

Closed

Activity

People

Assignee:: Ted Dunning

Reporter:: Jake Mannix

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Feb/10 23:53

Updated:: 31/Jan/24 22:16

Resolved:: 07/Mar/11 10:54