Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7708

Incorrect task serialization with Kryo closure serializer

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.2
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      I've been investigating the use of Kryo for closure serialization with Spark 1.2, and it seems like I've hit upon a bug:

      When a task is serialized before scheduling, the following log message is generated:

      [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, <host>, PROCESS_LOCAL, 302 bytes)

      This message comes from TaskSetManager which serializes the task using the closure serializer. Before the message is sent out, the TaskDescription (which included the original task as a byte array), is serialized again into a byte array with the closure serializer. I added a log message for this in CoarseGrainedSchedulerBackend, which produces the following output:

      [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132

      The serialized size of TaskDescription (132 bytes) turns out to be smaller than serialized task that it contains (302 bytes). This implies that TaskDescription.buffer is not getting serialized correctly.

      On the executor side, the deserialization produces a null value for TaskDescription.buffer.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                aaranya Akshat Aranya
              • Votes:
                1 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated: