Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-986

OutOfMemoryError in LanczosState by way of SpectralKMeans

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.6
    • 0.7
    • classic
    • None
    • Ubuntu 11.10 (64-bit)

    Description

      Dan Brickley and I have been testing SpectralKMeans with a dbpedia dataset ( http://danbri.org/2012/spectral/dbpedia/ ); effectively, a graph with 4,192,499 nodes. Not surprisingly, the LanczosSolver throws an OutOfMemoryError when it attempts to instantiate a DenseMatrix of dimensions 4192499-by-4192499 (~17.5 trillion double-precision floating point values). Here's the full stack trace:

      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:50)
      at org.apache.mahout.math.decomposer.lanczos.LanczosState.<init>(LanczosState.java:45)
      at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:146)
      at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:616)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
      at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:616)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

      Obviously SKM needs a more sustainable and memory-efficient way of performing an eigen-decomposition of the graph laplacian. For those who are more knowledgeable in the linear systems solvers of Mahout than I, can the Lanczos parameters be tweaked to negate the requirement of a full DenseMatrix? Or should SKM move to SSVD instead?

      Attachments

        1. MAHOUT-986.patch
          0.9 kB
          Shannon Quinn

        Activity

          People

            magsol Shannon Quinn
            magsol Shannon Quinn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: