Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6222

[STREAMING] All data may not be recovered from WAL when driver is killed

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.0
    • 1.3.1, 1.4.0
    • DStreams
    • None


      When testing for our next release, our internal tests written by Wing Yew Poon caught a regression in Spark Streaming between 1.2.0 and 1.3.0. The test runs FlumePolling stream to read data from Flume, then kills the Application Master. Once YARN restarts it, the test waits until no more data is to be written and verifies the original against the data on HDFS. This was passing in 1.2.0, but is failing now.

      Since the test ties into Cloudera's internal infrastructure and build process, it cannot be directly run on an Apache build. But I have been working on isolating the commit that may have caused the regression. I have confirmed that it was caused by SPARK-5147 (PR # 4149). I confirmed this several times using the test and the failure is consistently reproducible.

      To re-confirm, I reverted just this one commit (and Clock consolidation one to avoid conflicts), and the issue was no longer reproducible.

      Since this is a data loss issue, I believe this is a blocker for Spark 1.3.0
      /cc Tathagata Das, Patrick Wendell


        1. AfterPatch.txt
          423 kB
          Hari Shreedharan
        2. CleanWithoutPatch.txt
          508 kB
          Hari Shreedharan
        3. SPARK-6122.patch
          0.9 kB
          Hari Shreedharan

        Issue Links


          This comment will be Viewable by All Users Viewable by All Users


            hshreedharan Hari Shreedharan
            hshreedharan Hari Shreedharan
            0 Vote for this issue
            4 Start watching this issue




                Issue deployment