[SPARK-7708] Incorrect task serialization with Kryo closure serializer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.2.2
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

I've been investigating the use of Kryo for closure serialization with Spark 1.2, and it seems like I've hit upon a bug:

When a task is serialized before scheduling, the following log message is generated:

[info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, <host>, PROCESS_LOCAL, 302 bytes)

This message comes from TaskSetManager which serializes the task using the closure serializer. Before the message is sent out, the TaskDescription (which included the original task as a byte array), is serialized again into a byte array with the closure serializer. I added a log message for this in CoarseGrainedSchedulerBackend, which produces the following output:

[info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132

The serialized size of TaskDescription (132 bytes) turns out to be smaller than serialized task that it contains (302 bytes). This implies that TaskDescription.buffer is not getting serialized correctly.

On the executor side, the deserialization produces a null value for TaskDescription.buffer.

Attachments

Issue Links

is related to

SPARK-11416 Upgrade kryo package to version 3.0

Resolved

relates to

SPARK-4321 Make Kryo serialization work for closures

Resolved

links to

[Github] Pull Request #6361 (coolfrood)

Activity

People

Assignee:: Unassigned

Reporter:: Akshat Aranya

Votes:: 1 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 18/May/15 16:03

Updated:: 21/May/19 04:37

Resolved:: 21/May/19 04:37