Description
/**
* Computes the Gramian matrix `A^T A`.
** @note This cannot be computed on matrices with more than 65535 columns.
*/
As the above annotation of computeGramianMatrix in RowMatrix.scala said, it supports computing on matrices with no more than 65535 columns.
However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) when computing on matrices with 16000 columns.
The root casue seems that the TreeAggregate writes a very long buffer array (16000*16000*8) which exceeds jvm limit(2^31 - 1).
Does RowMatrix really supports computing on matrices with no more than 65535 columns?
I doubt that computeGramianMatrix has a very serious performance issue.
Do anyone has done some performance expriments before?
Attachments
Attachments
Issue Links
- is duplicated by
-
SPARK-27069 Spark(2.3.2) LDA transfomation memory error(java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232
- Resolved
- links to