Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26228

OOM issue encountered when computing Gramian matrix

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.3, 2.4.1, 3.0.0
    • MLlib
    • None

    Description

      /**

       * Computes the Gramian matrix `A^T A`.
       *

       * @note This cannot be computed on matrices with more than 65535 columns.
       */

      As the above annotation of computeGramianMatrix in RowMatrix.scala said, it supports computing on matrices with no more than 65535 columns.

      However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) when computing on matrices with 16000 columns.

      The root casue seems that the TreeAggregate writes a  very long buffer array (16000*16000*8) which exceeds jvm limit(2^31 - 1).

      Does RowMatrix really supports computing on matrices with no more than 65535 columns?

      I doubt that computeGramianMatrix has a very serious performance issue.

      Do anyone has done some performance expriments before?

       

       

      Attachments

        1. 1.jpeg
          114 kB
          Chen Lin

        Issue Links

          Activity

            People

              srowen Sean R. Owen
              hibayesian Chen Lin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: