Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14055

AssertionError may happeneds if not unlock writeLock when doing 'removeBlock' method

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • Block Manager, Spark Core
    • None
    • Spark 2.0-SNAPSHOT
      Single Rack
      Standalone mode scheduling
      8 node cluster
      16 cores & 64G RAM / node
      Data Replication factor of 2

      Each Node has 1 Spark executors configured with 16 cores each and 40GB of RAM.

    Description

      We got the following log when running LiveJournalPageRank.

      452823:16/03/21 19:28:47.444 TRACE BlockInfoManager: Task 1662 trying to acquire write lock for rdd_3_183
      452825:16/03/21 19:28:47.445 TRACE BlockInfoManager: Task 1662 acquired write lock for rdd_3_183
      456941:16/03/21 19:28:47.596 INFO BlockManager: Dropping block rdd_3_183 from memory
      456943:16/03/21 19:28:47.597 DEBUG MemoryStore: Block rdd_3_183 of size 418784648 dropped from memory (free 3504141600)
      457027:16/03/21 19:28:47.600 DEBUG BlockManagerMaster: Updated info of block rdd_3_183
      457053:16/03/21 19:28:47.600 DEBUG BlockManager: Told master about block rdd_3_183
      457082:16/03/21 19:28:47.602 TRACE BlockInfoManager: Task 1662 trying to remove block rdd_3_183
      500373:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to put rdd_3_183
      500374:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to acquire read lock for rdd_3_183
      500375:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to acquire write lock for rdd_3_183
      500376:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 acquired write lock for rdd_3_183
      517257:16/03/21 19:28:56.299 INFO BlockInfoManager: ****** taskAttemptId is: 1662, info.writerTask is: 1681, blockID is: rdd_3_183 so AssertionError happeneds here*****
      517258-16/03/21 19:28:56.299 ERROR Executor: Exception in task 177.0 in stage 10.0 (TID 1662)
      517259-java.lang.AssertionError: assertion failed
      517260- at scala.Predef$.assert(Predef.scala:151)
      517261- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:356)
      517262- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:351)
      517263- at scala.Option.foreach(Option.scala:257)
      517264- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:351)
      517265- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:350)
      517266- at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
      517267- at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:350)
      517268- at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:626)
      517269- at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:238)

      When memory for RDD storage is not sufficient and have to evict several partitions, this AssertionError may happened.
      For the above example, this is because while running Task 1662, several partition (including rdd_3_183) need to be evicted. So Task 1662 acquired read and write locks at first, then doing dropBlock method in MemoryStore.evictBlocksToFreeSpace and actually dropping rdd_3_183 from memory. The newEffectiveStorageLevel.isValid is false, so we run into BlockInfoManager.removeBlock, but writeLocksByTask is not update here.

      Unfortunately, Task 1681 is already started and needed to reproduce rdd_3_183 to produce it's target rdd here , and this task acquired write lock of rdd_3_183. When Task 1662 call releaseAllLocksForTask at last, this AssertionError occurs.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Earne Yuanzhen Geng
            Earne Yuanzhen Geng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment