Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-753 [Umbrella] Scalability improvements
  3. TEZ-2411

Offload DataMovement event creation from the AM to the tasks

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Today the AM creates a new DataMovement event from the original event sent by the producer task and supplements the new event with source/target indices for the consumer task. This new event creation can be offloaded to the task runtime and thus save CPU cycles on the AM for the object creation. Secondly, the original event can be kept in serialized form inside the AM and sent as is to the task over the RPC, thus potentially saving serde CPU for these events in addition to the object creation CPU. This can help when there is a high concurrency of running tasks in a job. Say 10000 tasks running in parallel and sending events to the AM.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bikassaha Bikas Saha
            bikassaha Bikas Saha

            Dates

              Created:
              Updated:

              Slack

                Issue deployment