Description
Currently Spark Streaming Kafka API stores the key and value of each message into BM for processing, potentially this may lose the flexibility for different requirements:
1. currently topic/partition/offset information for each message is discarded by KafkaInputDStream. In some scenarios people may need this information to better filter the message, like SPARK-2388 described.
2. People may need to add timestamp for each message when feeding into Spark Streaming, which can better measure the system latency.
3. Checkpointing the partition/offsets or others...
So here we add a messageHandler in interface to give people the flexibility to preprocess the message before storing into BM. In the meantime time this improvement keep compatible with current API.
Attachments
Issue Links
- incorporates
-
SPARK-2388 Streaming from multiple different Kafka topics is problematic
- Closed
- is related to
-
SPARK-4960 Interceptor pattern in receivers
- Resolved
- links to