[SPARK-25139] PythonRunner#WriterThread released block after TaskRunner finally block which invoke BlockManager#releaseAllLocksForTask - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.1
Fix Version/s: 2.3.4, 2.4.4, 3.0.0
Component/s: Block Manager, Spark Core
Labels:
None

Description

We run pyspark streaming on YARN, the executor will die caused by the error: the task released lock while finished, but the python writer haven't do real releasing lock.

Normally the task just double check the lock, but it ran wrong in front.

The executor trace log is below:
18/08/17 13:52:20 Executor task launch worker for task 137 DEBUG BlockManager: Getting local block input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 TRACE BlockInfoManager: Task 137 trying to acquire read lock for input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 TRACE BlockInfoManager: Task 137 acquired read lock for input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 DEBUG BlockManager: Level for block input-0-1534485138800 is StorageLevel(disk, memory, 1 replicas) 18/08/17 13:52:20 Executor task launch worker for task 137 INFO BlockManager: Found block input-0-1534485138800 locally 18/08/17 13:52:20 Executor task launch worker for task 137 INFO PythonRunner: Times: total = 8, boot = 3, init = 5, finish = 0 18/08/17 13:52:20 stdout writer for python TRACE BlockInfoManager: Task 137 releasing lock for input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 INFO Executor: 1 block locks were not released by TID = 137: [input-0-1534485138800] 18/08/17 13:52:20 stdout writer for python ERROR Utils: Uncaught exception in thread stdout writer for python java.lang.AssertionError: assertion failed: Block input-0-1534485138800 is not locked for reading at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299) at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:769) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:540) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:44) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:33) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:213) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:407) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:215) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:170) 18/08/17 13:52:20 stdout writer for python ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for python,5,main]

I think shoud wait WriterThread after Task#run.

Attachments

Issue Links

relates to

SPARK-18406 Race between end-of-task and completion iterator read lock release

Resolved

links to

GitHub Pull Request #24542

GitHub Pull Request #24552

GitHub Pull Request #24670

Activity

People

Assignee:: Xingbo Jiang

Reporter:: DENG FEI

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Aug/18 06:27

Updated:: 12/Dec/22 18:10

Resolved:: 08/May/19 04:54