Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7912

Couldn't restart streams when using Spark Structured Streaming when Kafka offset goes out of range

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • 0.16.0
    • spark
    • None

    Description

      When using spark structured streaming with kafka and writing data in Hudi,. when jobs sometimes cant keep up with the input rate and fails as the kafka offset goes out of range (i.e earliest kafka messages are cleaned up due to the retention policy) and when we try to restart the job by clearing the previous checkpoint and consume from latest offset we see that the batches are skipped by the 'HoodieStreamingSink'. 

      There is no way to restart these streams again currently.

      Attachments

        Activity

          People

            Unassigned Unassigned
            adityagoenka Aditya Goenka
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: