Description
There are two issues with the current Kafka support
- Use of Write Ahead Logs in Spark Streaming to ensure no data is lost - Causes data replication in both Kafka AND Spark Streaming.
- Lack of exactly-once semantics - For background, see http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html
We want to solve both these problem in JIRA. Please see the following design doc for the solution.
https://docs.google.com/a/databricks.com/document/d/1IuvZhg9cOueTf1mq4qwc1fhPb5FVcaRLcyjrtG4XU1k/edit#heading=h.itproy77j3p
Attachments
Attachments
Issue Links
- supercedes
-
SPARK-2803 add Kafka stream feature for fetch messages from specified starting offset position
- Resolved
- links to