Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41387

Add assertion on end offset range for Kafka data source with Trigger.AvailableNow

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Structured Streaming
    • None

    Description

      Although there are lots of benefits Trigger.AvailableNow provides, we figure out one caveat of Trigger.AvailableNow, very sensitive on the offset range.

      Trigger.AvailableNow stops the query when the start offset and end offset are being same, producing no data from data source. Given the semantic of Trigger.AvailableNow, the implementation of data source is expected to retrieve the final offset at the start of the query, and gradually increase the offset range to eventually reach the final offset.

      Any bug breaking this leads to infinity run of the query, hence all data source implementations supporting Trigger.AvailableNow are encouraged to have some assertion to prevent such case in prior.

      For built-in data sources, only Kafka data source is something supporting Trigger.AvailableNow but don't have some assertion on the offset range. We'd like to add some assertion against Kafka data source, for Trigger.AvailableNow.

      Attachments

        Activity

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: