[SPARK-3146] Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.2, 1.1.0
Fix Version/s: 1.3.0
Component/s: DStreams
Labels:
None

Description

Currently Spark Streaming Kafka API stores the key and value of each message into BM for processing, potentially this may lose the flexibility for different requirements:

1. currently topic/partition/offset information for each message is discarded by KafkaInputDStream. In some scenarios people may need this information to better filter the message, like ~~SPARK-2388~~ described.
2. People may need to add timestamp for each message when feeding into Spark Streaming, which can better measure the system latency.
3. Checkpointing the partition/offsets or others...

So here we add a messageHandler in interface to give people the flexibility to preprocess the message before storing into BM. In the meantime time this improvement keep compatible with current API.

Attachments

Issue Links

incorporates

SPARK-2388 Streaming from multiple different Kafka topics is problematic

Closed

is related to

SPARK-4960 Interceptor pattern in receivers

Resolved

links to

[Github] Pull Request #2053 (jerryshao)

Activity

People

Assignee:: Unassigned

Reporter:: Saisai Shao

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/Aug/14 06:29

Updated:: 08/Oct/16 19:11

Resolved:: 08/Oct/16 19:11