[SPARK-9523] Receiver for Spark Streaming does not naturally support kryo serializer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 1.3.1
Fix Version/s: None
Component/s: DStreams
Labels:
- kryo
- serialization
Environment:

Windows 7 local mode

Flags:

Patch

Description

In some cases, some attributes in a class is not serializable, which you still want to use after serialization of the whole object, you'll have to customize your serialization codes. For example, you can declare those attributes as transient, which makes them ignored during serialization, and then you can reassign their values during deserialization.

Now, if you're using Java serialization, you'll have to implement Serializable, and write those codes in readObject() and writeObejct() methods; And if you're using kryo serialization, you'll have to implement KryoSerializable, and write these codes in read() and write() methods.

In Spark and Spark Streaming, you can set kryo as the serializer for speeding up. However, the functions taken by RDD or DStream operations are still serialized by Java serialization, which means you only need to write those custom serialization codes in readObject() and writeObejct() methods.

But when it comes to Spark Streaming's Receiver, things are different. When you wish to customize an InputDStream, you must extend the Receiver. However, it turns out, the Receiver will be serialized by kryo if you set kryo serializer in SparkConf, and will fall back to Java serialization if you didn't.

So here's comes the problems, if you want to change the serializer by configuration and make sure the Receiver runs perfectly for both Java and kryo, you'll have to write all the 4 methods above. First, it is redundant, since you'll have to write serialization/deserialization code almost twice; Secondly, there's nothing in the doc or in the code to inform users to implement the KryoSerializable interface.

Since all other function parameters are serialized by Java only, I suggest you also make it so for the Receiver. It may be slower, but since the serialization will only be executed for each interval, it's durable. More importantly, it can cause fewer trouble

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Yuhang Chen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Aug/15 10:02

Updated:: 22/Oct/15 06:20

Resolved:: 21/Oct/15 09:31

Time Tracking

Estimated:

120h

Remaining:

120h

Logged:

Not Specified