Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34321

Fix the guarantee of foreachBatch

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.2.0
    • None
    • Structured Streaming
    • None

    Description

      Similar to SPARK-28650, foreachBatch API document also documents the guarantee:

      The batchId can be used to deduplicate and transactionally write the output (that is, the provided Dataset) to external systems. The output Dataset is guaranteed to be exactly the same for the same batchId

      But like the reason of fixing the document of ForeachWriter in SPARK-28650, it is not hard to break the guarantee by changing the partition number.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: