Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34297

Add metrics for data loss and offset out range for KafkaMicroBatchStream

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL, Structured Streaming
    • None

    Description

      When testing SS, I found it is hard to track data loss of SS reading from Kafka. The micro scan node has only one metric, number of output rows. Users have no idea how many times offsets to fetch are out of Kafak now, how many times data loss happens.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: