Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25139

PythonRunner#WriterThread released block after TaskRunner finally block which invoke BlockManager#releaseAllLocksForTask

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1
    • Fix Version/s: 2.3.4, 2.4.4, 3.0.0
    • Component/s: Block Manager
    • Labels:
      None

      Description

      We run pyspark streaming on YARN, the executor will die caused by the error: the task released lock while finished, but the python writer haven't do real releasing lock.

      Normally the task just double check the lock, but it ran wrong in front.

      The executor trace log is below:
      18/08/17 13:52:20 Executor task launch worker for task 137 DEBUG BlockManager: Getting local block input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 TRACE BlockInfoManager: Task 137 trying to acquire read lock for input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 TRACE BlockInfoManager: Task 137 acquired read lock for input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 DEBUG BlockManager: Level for block input-0-1534485138800 is StorageLevel(disk, memory, 1 replicas) 18/08/17 13:52:20 Executor task launch worker for task 137 INFO BlockManager: Found block input-0-1534485138800 locally 18/08/17 13:52:20 Executor task launch worker for task 137 INFO PythonRunner: Times: total = 8, boot = 3, init = 5, finish = 0 18/08/17 13:52:20 stdout writer for python TRACE BlockInfoManager: Task 137 releasing lock for input-0-1534485138800 18/08/17 13:52:20 Executor task launch worker for task 137 INFO Executor: 1 block locks were not released by TID = 137: [input-0-1534485138800] 18/08/17 13:52:20 stdout writer for python ERROR Utils: Uncaught exception in thread stdout writer for python java.lang.AssertionError: assertion failed: Block input-0-1534485138800 is not locked for reading at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299) at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:769) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:540) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:44) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:33) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:213) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:407) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:215) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:170) 18/08/17 13:52:20 stdout writer for python ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for python,5,main]

       

      I think shoud wait WriterThread after Task#run.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jiangxb1987 Xingbo Jiang
                Reporter:
                Deng FEI DENG FEI
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: