Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38965

Optimize RemoteBlockPushResolver with a memory pool

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Shuffle
    • None

    Description

      For push-based shuffle service, there are many BLOCK_APPEND_COLLISION_DETECTED when there are many small map tasks outputs. In RemoteBlockPushResolver, if one map task pushed blocks is writing, the others map tasks pushed blocks will failed in onComplete() method.
      And RemoteBlockPushResolver has no memory limit , so many executors will OOM when there are many small pushed blocks waiting to be written to the final data file.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            wankun Wan Kun

            Dates

              Created:
              Updated:

              Slack

                Issue deployment