Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25052

Is there any possibility that spark structured streaming generate duplicates in the output?

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • 2.3.0
    • None
    • Spark Core
    • None

    Description

      We recently observed that the spark structured streaming generated duplicates in the output when reading from Kafka topic and storing the output to the S3 (and checkpointing in S3).  We ran into this issue twice. This is not reproducible. Is there anyone has ever faced this kind of issue before? Is this because of S3 eventual consistency?

      Attachments

        Activity

          People

            Unassigned Unassigned
            abharath9 bharath kumar avusherla
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: