[SPARK-29244] ArrayIndexOutOfBoundsException on TaskCompletionListener during releasing of memory blocks - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.5, 3.0.0
Component/s: Spark Core
Labels:
None
Environment:

Release label:emr-5.20.0
Hadoop distribution:Amazon 2.8.5
Applications:Livy 0.5.0, Spark 2.4.0

Description

At the end of task completion an exception happened:


19/09/25 09:03:58 ERROR TaskContextImpl: Error in TaskCompletionListener19/09/25 09:03:58 ERROR TaskContextImpl: Error in TaskCompletionListenerjava.lang.ArrayIndexOutOfBoundsException: -3 at org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:333) at org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) at org.apache.spark.memory.MemoryConsumer.freeArray(MemoryConsumer.java:108) at org.apache.spark.unsafe.map.BytesToBytesMap.free(BytesToBytesMap.java:803) at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.free(UnsafeFixedWidthAggregationMap.java:225) at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.lambda$new$0(UnsafeFixedWidthAggregationMap.java:111) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Important to note, that before this one, there was OOM of allocating some pages. It looks like everything related to each other, but on OOM the whole flow goes abnormally, so no resources are fried correctly.


java.lang.NullPointerExceptionjava.lang.NullPointerException at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getMemoryUsage(UnsafeInMemorySorter.java:208) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getMemoryUsage(UnsafeExternalSorter.java:249) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.updatePeakMemoryUsed(UnsafeExternalSorter.java:253) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.freeMemory(UnsafeExternalSorter.java:296) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:328) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.lambda$new$0(UnsafeExternalSorter.java:178) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

This is must be something with job planning, but taking so many exceptions into account doesn't make things easier. Would be happy to provide more details.