Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12155

Execution OOM after a relative large dataset cached in the cluster.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.6.0
    • Spark Core, SQL
    • None

    Description

      I have a cluster with relative 80GB of mem. Then, I cached a 43GB dataframe. When I start to consume the query. I got the following exception (I added more logs to the code).

      15/12/05 00:33:43 INFO UnifiedMemoryManager: Creating UnifedMemoryManager for 4 cores with 16929521664 maxMemory, 8464760832 storageRegionSize.
      
      
      15/12/05 01:20:50 INFO MemoryStore: Ensuring 1048576 bytes of free space for block rdd_94_37(free: 3253659951, max: 16798973952)
      15/12/05 01:20:50 INFO MemoryStore: Ensuring 5142008 bytes of free space for block rdd_94_37(free: 3252611375, max: 16798973952)
      15/12/05 01:20:50 INFO Executor: Finished task 36.0 in stage 4.0 (TID 109). 3028 bytes result sent to driver
      15/12/05 01:20:50 INFO MemoryStore: Ensuring 98948238 bytes of free space for block rdd_94_37(free: 3314840375, max: 16866344960)
      15/12/05 01:20:50 INFO MemoryStore: Ensuring 98675713 bytes of free space for block rdd_94_37(free: 3215892137, max: 16866344960)
      15/12/05 01:20:50 INFO MemoryStore: Ensuring 197347565 bytes of free space for block rdd_94_37(free: 3117216424, max: 16866344960)
      15/12/05 01:20:50 INFO MemoryStore: Ensuring 295995553 bytes of free space for block rdd_94_37(free: 2919868859, max: 16866344960)
      15/12/05 01:20:51 INFO MemoryStore: Ensuring 394728479 bytes of free space for block rdd_94_37(free: 2687050010, max: 16929521664)
      15/12/05 01:20:51 INFO Executor: Finished task 32.0 in stage 4.0 (TID 106). 3028 bytes result sent to driver
      15/12/05 01:20:51 INFO MemoryStore: Ensuring 591258816 bytes of free space for block rdd_94_37(free: 2292321531, max: 16929521664)
      15/12/05 01:20:51 INFO MemoryStore: Ensuring 901645182 bytes of free space for block rdd_94_37(free: 1701062715, max: 16929521664)
      15/12/05 01:20:52 INFO MemoryStore: Ensuring 1302179076 bytes of free space for block rdd_94_37(free: 799417533, max: 16929521664)
      15/12/05 01:20:52 INFO MemoryStore: Will not store rdd_94_37 as it would require dropping another block from the same RDD
      15/12/05 01:20:52 WARN MemoryStore: Not enough space to cache rdd_94_37 in memory! (computed 2.4 GB so far)
      15/12/05 01:20:52 INFO MemoryStore: Memory use = 12.6 GB (blocks) + 2.4 GB (scratch space shared across 13 tasks(s)) = 15.0 GB. Storage limit = 15.8 GB.
      15/12/05 01:20:52 INFO BlockManager: Found block rdd_94_37 locally
      15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to acquire 262144 bytes memory. But, on-heap execution memory poll only has 0 bytes free memory.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8464760832, storageMemoryPool.poolSize 16929521664, storageRegionSize 8464760832.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:52 INFO StorageMemoryPool: Claiming 262144 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: Reclaimed 262144 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes memory. But, on-heap execution memory poll only has 0 bytes free memory.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8464498688, storageMemoryPool.poolSize 16929259520, storageRegionSize 8464760832.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:52 INFO StorageMemoryPool: Claiming 67108864 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:52 INFO UnifiedMemoryManager: Reclaimed 67108864 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:54 INFO Executor: Finished task 37.0 in stage 4.0 (TID 110). 3077 bytes result sent to driver
      15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 120
      15/12/05 01:20:56 INFO Executor: Running task 1.0 in stage 5.0 (TID 120)
      15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 124
      15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 128
      15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 132
      15/12/05 01:20:56 INFO Executor: Running task 9.0 in stage 5.0 (TID 128)
      15/12/05 01:20:56 INFO Executor: Running task 13.0 in stage 5.0 (TID 132)
      15/12/05 01:20:56 INFO Executor: Running task 5.0 in stage 5.0 (TID 124)
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Updating epoch to 2 and clearing cache
      15/12/05 01:20:56 INFO TorrentBroadcast: Started reading broadcast variable 6
      15/12/05 01:20:56 INFO MemoryStore: Ensuring 9471 bytes of free space for block broadcast_6_piece0(free: 3384207663, max: 16929521664)
      15/12/05 01:20:56 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 9.2 KB, free 12.6 GB)
      15/12/05 01:20:56 INFO TorrentBroadcast: Reading broadcast variable 6 took 5 ms
      15/12/05 01:20:56 INFO MemoryStore: Ensuring 1048576 bytes of free space for block broadcast_6(free: 3384198192, max: 16929521664)
      15/12/05 01:20:56 INFO MemoryStore: Ensuring 22032 bytes of free space for block broadcast_6(free: 3384198192, max: 16929521664)
      15/12/05 01:20:56 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 21.5 KB, free 12.6 GB)
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.0.202.130:56969)
      15/12/05 01:20:56 INFO MapOutputTrackerWorker: Got the output locations
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Getting 43 non-empty blocks out of 43 blocks
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Getting 43 non-empty blocks out of 43 blocks
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Getting 43 non-empty blocks out of 43 blocks
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Getting 43 non-empty blocks out of 43 blocks
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 41 ms
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 41 ms
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 40 ms
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 41 ms
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes memory. But, on-heap execution memory poll only has 66846720 bytes free memory.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8397389824, storageMemoryPool.poolSize 16862150656, storageRegionSize 8464760832.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:56 INFO StorageMemoryPool: Claiming 262144 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Reclaimed 262144 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes memory. But, on-heap execution memory poll only has 33554432 bytes free memory.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8397127680, storageMemoryPool.poolSize 16861888512, storageRegionSize 8464760832.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:56 INFO StorageMemoryPool: Claiming 33554432 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Reclaimed 33554432 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:56 INFO GenerateMutableProjection: Code generated in 9.602791 ms
      15/12/05 01:20:56 INFO GenerateMutableProjection: Code generated in 12.7135 ms
      15/12/05 01:20:56 INFO Executor: Finished task 13.0 in stage 5.0 (TID 132). 2271 bytes result sent to driver
      15/12/05 01:20:56 INFO Executor: Finished task 9.0 in stage 5.0 (TID 128). 2320 bytes result sent to driver
      15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 136
      15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 137
      15/12/05 01:20:56 INFO Executor: Running task 17.0 in stage 5.0 (TID 136)
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Getting 43 non-empty blocks out of 43 blocks
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 1 ms
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes memory. But, on-heap execution memory poll only has 16515072 bytes free memory.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8363573248, storageMemoryPool.poolSize 16828334080, storageRegionSize 8464760832.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:56 INFO StorageMemoryPool: Claiming 50593792 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Reclaimed 50593792 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:56 INFO Executor: Running task 18.0 in stage 5.0 (TID 137)
      15/12/05 01:20:56 INFO GenerateUnsafeProjection: Code generated in 30.25836 ms
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Getting 43 non-empty blocks out of 43 blocks
      15/12/05 01:20:56 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 2 ms
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes memory. But, on-heap execution memory poll only has 16515072 bytes free memory.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8312979456, storageMemoryPool.poolSize 16777740288, storageRegionSize 8464760832.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:56 INFO StorageMemoryPool: Claiming 50593792 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:56 INFO UnifiedMemoryManager: Reclaimed 50593792 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:56 INFO GenerateUnsafeRowJoiner: Code generated in 19.615021 ms
      15/12/05 01:20:57 INFO GenerateUnsafeProjection: Code generated in 23.149594 ms
      15/12/05 01:20:57 INFO TaskMemoryManager: Memory used in task 136
      15/12/05 01:20:57 INFO TaskMemoryManager: Acquired by org.apache.spark.unsafe.map.BytesToBytesMap@5ac6b585: 48.3 MB
      15/12/05 01:20:57 INFO TaskMemoryManager: 0 bytes of memory were used by task 136 but are not associated with specific consumers
      15/12/05 01:20:57 INFO TaskMemoryManager: 185597952 bytes of memory are used for execution and 13545345504 bytes of memory are used for storage
      15/12/05 01:20:57 INFO TaskMemoryManager: Memory used in task 124
      15/12/05 01:20:57 INFO TaskMemoryManager: Acquired by org.apache.spark.unsafe.map.BytesToBytesMap@30015a6a: 48.3 MB
      15/12/05 01:20:57 INFO TaskMemoryManager: 0 bytes of memory were used by task 124 but are not associated with specific consumers
      15/12/05 01:20:57 INFO TaskMemoryManager: 185597952 bytes of memory are used for execution and 13545345504 bytes of memory are used for storage
      15/12/05 01:20:57 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes memory. But, on-heap execution memory poll only has 16515072 bytes free memory.
      15/12/05 01:20:57 INFO UnifiedMemoryManager: memoryReclaimableFromStorage 8262385664, storageMemoryPool.poolSize 16727146496, storageRegionSize 8464760832.
      15/12/05 01:20:57 INFO UnifiedMemoryManager: Try to reclaim memory space from storage memory pool.
      15/12/05 01:20:57 INFO StorageMemoryPool: Claiming 50593792 bytes free memory space from StorageMemoryPool.
      15/12/05 01:20:57 INFO UnifiedMemoryManager: Reclaimed 50593792 bytes of memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool.
      15/12/05 01:20:57 INFO TaskMemoryManager: Memory used in task 137
      15/12/05 01:20:57 INFO TaskMemoryManager: Acquired by org.apache.spark.unsafe.map.BytesToBytesMap@a9691e0: 48.3 MB
      15/12/05 01:20:57 WARN TaskMemoryManager: leak 48.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@5ac6b585
      15/12/05 01:20:57 INFO TaskMemoryManager: 0 bytes of memory were used by task 137 but are not associated with specific consumers
      15/12/05 01:20:57 INFO TaskMemoryManager: 215023616 bytes of memory are used for execution and 13545345504 bytes of memory are used for storage
      15/12/05 01:20:57 WARN TaskMemoryManager: leak 48.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@a9691e0
      15/12/05 01:20:57 ERROR Executor: Managed memory leak detected; size = 50593792 bytes, TID = 136
      15/12/05 01:20:57 ERROR Executor: Managed memory leak detected; size = 50593792 bytes, TID = 137
      15/12/05 01:20:57 WARN TaskMemoryManager: leak 48.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@30015a6a
      15/12/05 01:20:57 ERROR Executor: Managed memory leak detected; size = 50593792 bytes, TID = 124
      15/12/05 01:20:57 ERROR Executor: Exception in task 18.0 in stage 5.0 (TID 137)
      java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
      	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 ERROR Executor: Exception in task 17.0 in stage 5.0 (TID 136)
      java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
      	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 ERROR Executor: Exception in task 5.0 in stage 5.0 (TID 124)
      java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
      	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-4,5,main]
      java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
      	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 INFO DiskBlockManager: Shutdown hook called
      15/12/05 01:20:57 INFO GenerateMutableProjection: Code generated in 21.666344 ms
      15/12/05 01:20:57 DEBUG KeepAliveThread: KeepAliveThread received command: Shutdown
      15/12/05 01:20:57 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker-6,5,main]
      java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
      	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker-7,5,main]
      java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
      	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
      	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
      	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 INFO KeepAliveThread: KeepAlive thread has been shutdown successfully
      15/12/05 01:20:57 WARN TaskMemoryManager: leak 28.1 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@6feafdad
      15/12/05 01:20:57 ERROR Executor: Managed memory leak detected; size = 29425664 bytes, TID = 120
      15/12/05 01:20:57 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 120)
      java.io.FileNotFoundException: /local_disk/spark-1ebb23ad-e3a1-4af2-b3d0-58a70ceed7ec/executor-ca2c389d-8b67-487f-b175-b867282bf0a3/blockmgr-deda3833-d86c-4850-aa4f-64c26ebfbc4f/08/temp_shuffle_8b5df98d-701c-4ef3-98cc-9e4731fe4a68 (No such file or directory)
      	at java.io.FileOutputStream.open0(Native Method)
      	at java.io.FileOutputStream.open(FileOutputStream.java:270)
      	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
      	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
      	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      15/12/05 01:20:57 INFO ShutdownHookManager: Shutdown hook called
      

      The query plan was like

      TungstenAggregate4
      +- TungstenExchange2
         +- TungstenAggregate3
            +- TungstenAggregate2
               +- TungstenExchange1
                  +- TungstenAggregate1
                     +- Project 
                        +- InMemoryColumnarTableScan
      

      OOM happened in the stage having TungstenAggregate2 and TungstenAggregate3.

      Attachments

        Issue Links

          Activity

            People

              andrewor14 Andrew Or
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: