[SPARK-4964] Exactly-once + WAL-free Kafka Support in Spark Streaming - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: DStreams
Labels:
None

Description

There are two issues with the current Kafka support

Use of Write Ahead Logs in Spark Streaming to ensure no data is lost - Causes data replication in both Kafka AND Spark Streaming.
Lack of exactly-once semantics - For background, see http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html

We want to solve both these problem in JIRA. Please see the following design doc for the solution.
https://docs.google.com/a/databricks.com/document/d/1IuvZhg9cOueTf1mq4qwc1fhPb5FVcaRLcyjrtG4XU1k/edit#heading=h.itproy77j3p

Attachments

Issue Links

supercedes

SPARK-2803 add Kafka stream feature for fetch messages from specified starting offset position

Resolved

links to

[Github] Pull Request #3798 (koeninger)

[Github] Pull Request #4384 (tdas)

[Github] Pull Request #4511 (koeninger)

Activity

People

Assignee:: Cody Koeninger

Reporter:: Cody Koeninger

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 25/Dec/14 07:39

Updated:: 12/Sep/15 00:59

Resolved:: 04/Feb/15 20:07