Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30657

Streaming limit after streaming dropDuplicates can throw error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
    • Fix Version/s: 3.0.0
    • Component/s: Structured Streaming
    • Labels:
      None
    • Target Version/s:

      Description

      LocalLimitExec does not consume the iterator of the child plan. So if there is a limit after a stateful operator like streaming dedup in append mode (e.g. streamingdf.dropDuplicates().limit(5)), the state changes of streaming duplicate may not be committed (most stateful ops commit state changes only after the generated iterator is fully consumed). This leads to the next batch failing with java.lang.IllegalStateException: Error reading delta file .../N.delta does not exist as the state store delta file was never generated.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tdas Tathagata Das
                Reporter:
                tdas Tathagata Das
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: