Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13715

Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle

Details

    • Bug
    • Status: Open
    • P1
    • Resolution: Unresolved
    • 2.26.0, 2.27.0, 2.28.0, 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0, 2.34.0, 2.35.0, 2.36.0
    • None
    • io-java-kafka
    • None

    Description

      A user noticed that we commit Kafka offsets without any obvious checkpointing. We use a Reshuffle.byRandomKey() to cause Dataflow and the SparkRunner to checkpoint. But on runners with non-checkpointing shuffle, this risks data loss.

       

      The modern solution is to use @RequiresStableInput. This is not perfectly/fully implemented across many runners, so we still need the explicit shuffle for now.

       

      https://stackoverflow.com/questions/70785672/apache-beam-kafkaio-commit-offset-behaviour

      Attachments

        Activity

          People

            Unassigned Unassigned
            kenn Kenneth Knowles
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: