Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1043

NMF implementation taking too long

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Abandoned
    • None
    • Not Applicable
    • APIs, PythonAPI
    • None
    • standalone mode on labtop, and yarn cluster with 10 nodes

    Description

      I'm testing the following NMF algorithm written using python API:

      from pyspark.sql import SQLContext
      import systemml as sml
      from systemml import random
      
      sqlContext = SQLContext(sc)
      sml.setSparkContext(sc)
      
      m, n = tfidf.shape
      k = 40
      V = sml.matrix(tfidf)
      W = sml.random.uniform(size=(m, k))
      H = sml.random.uniform(size=(k, n))
      
      max_iters = 200
      for i in range(max_iters):
          H = H * (W.transpose().dot(V))/(W.transpose().dot(W.dot(H)))
          W = W * (V.dot(H.transpose()))/(W.dot(H.dot(H.transpose())))
      
      W = W.toNumPyArray()
      

      Here tfidf is a sparse matrix of shape (114720, 11590)

      The evaluation of W takes more than one hour when running on laptop. On yarn cluster, it didn't finish in 1.5 hours (I killed the job).

      If I evaluate H matrix instead, it just takes 2 min.

      Note that even if I call eval before evaluating W, it doesn't make any difference. W still takes an hour.

      Attachments

        Activity

          People

            Unassigned Unassigned
            iyounus Imran Younus
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: