Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1647

Prevent data loss when Streaming driver goes down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • DStreams
    • None

    Description

      Currently when the driver goes down, any uncheckpointed data is lost from within spark. If the system from which messages are pulled can replay messages, the data may be available - but for some systems, like Flume this is not the case.

      Also, all windowing information is lost for windowing functions.

      We must persist raw data somehow, and be able to replay this data if required. We also must persist windowing information with the data itself.

      This will likely require quite a bit of work to complete and probably will have to be split into several sub-jiras.

      Attachments

        Issue Links

          Activity

            People

              hshreedharan Hari Shreedharan
              hshreedharan Hari Shreedharan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: