Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
In the existing code, there are three layers of serialization
involved in sending a task from the scheduler to an executor:
- A Task object is serialized
- The Task object is copied to a byte buffer that also
contains serialized information about any additional JARs,
files, and Properties needed for the task to execute. This
byte buffer is stored as the member variable serializedTask
in the TaskDescription class. - The TaskDescription is serialized (in addition to the serialized
task + JARs, the TaskDescription class contains the task ID and
other metadata) and sent in a LaunchTask message.
While it is necessary to have two layers of serialization, so that
the JAR, file, and Property info can be deserialized prior to
deserializing the Task object, the third layer of deserialization is
unnecessary (this is as a result of SPARK-2521). We should
eliminate a layer of serialization by moving the JARs, files, and Properties
into the TaskDescription class.
Attachments
Issue Links
- is related to
-
SPARK-19796 taskScheduler fails serializing long statements received by thrift server
- Resolved
- relates to
-
SPARK-18890 Do all task serialization in CoarseGrainedExecutorBackend thread (rather than TaskSchedulerImpl)
- Resolved
- links to