Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7708

Incorrect task serialization with Kryo closure serializer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.2.2
    • None
    • Spark Core

    Description

      I've been investigating the use of Kryo for closure serialization with Spark 1.2, and it seems like I've hit upon a bug:

      When a task is serialized before scheduling, the following log message is generated:

      [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, <host>, PROCESS_LOCAL, 302 bytes)

      This message comes from TaskSetManager which serializes the task using the closure serializer. Before the message is sent out, the TaskDescription (which included the original task as a byte array), is serialized again into a byte array with the closure serializer. I added a log message for this in CoarseGrainedSchedulerBackend, which produces the following output:

      [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132

      The serialized size of TaskDescription (132 bytes) turns out to be smaller than serialized task that it contains (302 bytes). This implies that TaskDescription.buffer is not getting serialized correctly.

      On the executor side, the deserialization produces a null value for TaskDescription.buffer.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aaranya Akshat Aranya
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: