Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Currently when the driver goes down, any uncheckpointed data is lost from within spark. If the system from which messages are pulled can replay messages, the data may be available - but for some systems, like Flume this is not the case.
Also, all windowing information is lost for windowing functions.
We must persist raw data somehow, and be able to replay this data if required. We also must persist windowing information with the data itself.
This will likely require quite a bit of work to complete and probably will have to be split into several sub-jiras.
Attachments
Issue Links
- duplicates
-
SPARK-3129 Prevent data loss in Spark Streaming on driver failure using Write Ahead Logs
- Resolved
- relates to
-
SPARK-1645 Improve Spark Streaming compatibility with Flume
- Resolved