Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24720

kafka transaction creates Non-consecutive Offsets (due to transaction offset) making streaming fail when failOnDataLoss=true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.1
    • None
    • DStreams

    Description

      When kafka transactions are used, sending 1 message to kafka will result to 1 offset for the data + 1 offset to mark the transaction.

      When kafka connector for spark streaming read a topic with non-consecutive offset, it leads to a failure. SPARK-17147 fixed this issue for compacted topics.
      However, SPARK-17147 doesn't fix this issue for kafka transactions: if 1 message + 1 transaction commit are in a partition, spark will try to read offsets  [0 2[. offset 0 (containing the message) will be read, but offset 1 won't return a value and buffer.hasNext() will be false even after a poll since no data are present for offset 1 (it's the transaction commit)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            qambard Quentin Ambard
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: