Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26228

OOM issue encountered when computing Gramian matrix

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.3, 2.4.1, 3.0.0
    • Component/s: MLlib
    • Labels:
      None

      Description

      /**

       * Computes the Gramian matrix `A^T A`.
       *

       * @note This cannot be computed on matrices with more than 65535 columns.
       */

      As the above annotation of computeGramianMatrix in RowMatrix.scala said, it supports computing on matrices with no more than 65535 columns.

      However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) when computing on matrices with 16000 columns.

      The root casue seems that the TreeAggregate writes a  very long buffer array (16000*16000*8) which exceeds jvm limit(2^31 - 1).

      Does RowMatrix really supports computing on matrices with no more than 65535 columns?

      I doubt that computeGramianMatrix has a very serious performance issue.

      Do anyone has done some performance expriments before?

       

       

        Attachments

        1. 1.jpeg
          114 kB
          Chen Lin

          Issue Links

            Activity

              People

              • Assignee:
                srowen Sean Owen
                Reporter:
                hibayesian Chen Lin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: