[SYSTEMDS-1043] NMF implementation taking too long - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Abandoned
Affects Version/s: None
Fix Version/s: Not Applicable
Component/s: APIs, PythonAPI
Labels:
None
Environment:
standalone mode on labtop, and yarn cluster with 10 nodes

Description

I'm testing the following NMF algorithm written using python API:

from pyspark.sql import SQLContext
import systemml as sml
from systemml import random

sqlContext = SQLContext(sc)
sml.setSparkContext(sc)

m, n = tfidf.shape
k = 40
V = sml.matrix(tfidf)
W = sml.random.uniform(size=(m, k))
H = sml.random.uniform(size=(k, n))

max_iters = 200
for i in range(max_iters):
    H = H * (W.transpose().dot(V))/(W.transpose().dot(W.dot(H)))
    W = W * (V.dot(H.transpose()))/(W.dot(H.dot(H.transpose())))

W = W.toNumPyArray()

Here tfidf is a sparse matrix of shape (114720, 11590)

The evaluation of W takes more than one hour when running on laptop. On yarn cluster, it didn't finish in 1.5 hours (I killed the job).

If I evaluate H matrix instead, it just takes 2 min.

Note that even if I call eval before evaluating W, it doesn't make any difference. W still takes an hour.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Imran Younus

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Oct/16 02:26

Updated:: 02/Dec/23 19:05

Resolved:: 25/Jul/20 14:23