Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24986

OOM in BufferHolder during writes to a stream

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.1.0, 2.2.0, 2.3.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:

      Description

      We have seen out of memory exception while running one of our prod jobs. We expect the memory allocation to be managed by unified memory manager during run time.

      So the buffer which is growing during write is somewhat like this if the rowlength is constant then the buffer does not grow… it keeps resetting and writing the values to  the buffer… if the rows are variable and it is skewed and has huge stuff to be written this happens and i think the estimator which requests for initial execution memory does not account for this i think… Checking for underlying heap before growing the global buffer might be a viable option

      java.lang.OutOfMemoryError: Java heap space
      at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)
      at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_1$(Unknown Source)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
      at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:232)
      at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:221)
      at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:159)
      at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:29)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:1075)
      at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:1091)
      at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1129)
      at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1132)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:513)
      at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:329)
      at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1966)
      at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:270)
      18/06/11 21:18:41 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[stdout writer for Python/bin/python3.6,5,main]

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sanket991 Sanket Reddy
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: