Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22083

When dropping multiple blocks to disk, Spark should release all locks on a failure

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    Description

      MemoryStore.evictBlocksToFreeSpace first acquires writer locks on all the blocks it intends to evict . However, if there is an exception while dropping blocks, there is no finally block to release all the locks.

      If there is only one block being dropped, this isn't a problem (probably). Usually the call stack goes from MemoryStore.evictBlocksToFreeSpace --> dropBlocks --> BlockManager.dropFromMemory --> DiskStore.put. And DiskStore.put does do a removeBlock() in a finally block, which cleans up the locks.

      I ran into this from the serialization issue in SPARK-21928. In that, a netty thread ends up trying to evict some blocks from memory to disk, and fails. When there is only one block that needs to be evicted, and the error occurs, there isn't any real problem; I assume that netty thread is dead, but the executor threads seem fine. However, in the cases where two blocks get dropped, one task gets completely stuck. Unfortunately I don't have a stack trace from the stuck executor, but I assume it just waits forever on this lock that never gets released.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            irashid Imran Rashid Assign to me
            irashid Imran Rashid
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment