Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27549

Commit Kafka Source offsets to facilitate external tooling

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.0.0
    • None
    • Structured Streaming
    • None

    Description

      Tools monitoring consumer lag could benefit from having the option of saving the source offsets. Sources use the implementation of org.apache.spark.sql.sources.v2.reader.streaming.

      SparkDataStream. KafkaMicroBatchStream currently does not commit anything as expected so we could expand that.

      Other streaming engines like Flink allow you to enable `auto.commit` at the expense of not having checkpointing.

      Here the proposal is to allow commit the sources offsets when progress has been made.

      I am also aware that another option would be to have a StreamingQueryListener and intercept when batches are completed and then write the offsets anywhere you need to but it would be great if Kafka integration with Structured Streaming could do some of this work anyway.

      cody@koeninger.org  marmbrus what do you think?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              skonto Stavros Kontopoulos
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: