Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24067

Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.1
    • Component/s: DStreams
    • Labels:
      None

      Description

      SPARK-17147 fixes a problem with non-consecutive Kafka Offsets. The  PR was merged to master. This should be backported to 2.3.

       

      Original Description from SPARK-17147 :

       

      When Kafka does log compaction offsets often end up with gaps, meaning the next requested offset will be frequently not be offset+1. The logic in KafkaRDD & CachedKafkaConsumer has a baked in assumption that the next offset will always be just an increment of 1 above the previous offset.

      I have worked around this problem by changing CachedKafkaConsumer to use the returned record's offset, from:
      nextOffset = offset + 1
      to:
      nextOffset = record.offset + 1

      and changed KafkaRDD from:
      requestOffset += 1
      to:
      requestOffset = r.offset() + 1

      (I also had to change some assert logic in CachedKafkaConsumer).

      There's a strong possibility that I have misconstrued how to use the streaming kafka consumer, and I'm happy to close this out if that's the case. If, however, it is supposed to support non-consecutive offsets (e.g. due to log compaction) I am also happy to contribute a PR.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                koeninger Cody Koeninger
                Reporter:
                jhereth Joachim Hereth
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: