Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17931

taskScheduler has some unneeded serialization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.2.0
    • Scheduler, Spark Core
    • None

    Description

      In the existing code, there are three layers of serialization
      involved in sending a task from the scheduler to an executor:

      • A Task object is serialized
      • The Task object is copied to a byte buffer that also
        contains serialized information about any additional JARs,
        files, and Properties needed for the task to execute. This
        byte buffer is stored as the member variable serializedTask
        in the TaskDescription class.
      • The TaskDescription is serialized (in addition to the serialized
        task + JARs, the TaskDescription class contains the task ID and
        other metadata) and sent in a LaunchTask message.

      While it is necessary to have two layers of serialization, so that
      the JAR, file, and Property info can be deserialized prior to
      deserializing the Task object, the third layer of deserialization is
      unnecessary (this is as a result of SPARK-2521). We should
      eliminate a layer of serialization by moving the JARs, files, and Properties
      into the TaskDescription class.

      Attachments

        Issue Links

          Activity

            People

              kayousterhout Kay Ousterhout
              gq Guoqiang Li
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: