Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30657

Streaming limit after streaming dropDuplicates can throw error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
    • 3.0.0
    • Structured Streaming
    • None

    Description

      LocalLimitExec does not consume the iterator of the child plan. So if there is a limit after a stateful operator like streaming dedup in append mode (e.g. streamingdf.dropDuplicates().limit(5)), the state changes of streaming duplicate may not be committed (most stateful ops commit state changes only after the generated iterator is fully consumed). This leads to the next batch failing with java.lang.IllegalStateException: Error reading delta file .../N.delta does not exist as the state store delta file was never generated.

      Attachments

        Issue Links

          Activity

            People

              tdas Tathagata Das
              tdas Tathagata Das
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: