Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1647

Prevent data loss when Streaming driver goes down

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:
      None

      Description

      Currently when the driver goes down, any uncheckpointed data is lost from within spark. If the system from which messages are pulled can replay messages, the data may be available - but for some systems, like Flume this is not the case.

      Also, all windowing information is lost for windowing functions.

      We must persist raw data somehow, and be able to replay this data if required. We also must persist windowing information with the data itself.

      This will likely require quite a bit of work to complete and probably will have to be split into several sub-jiras.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hshreedharan Hari Shreedharan
                Reporter:
                hshreedharan Hari Shreedharan
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: