[SPARK-25052] Is there any possibility that spark structured streaming generate duplicates in the output? - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Closed
Priority: Minor
Resolution: Invalid
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

We recently observed that the spark structured streaming generated duplicates in the output when reading from Kafka topic and storing the output to the S3 (and checkpointing in S3). We ran into this issue twice. This is not reproducible. Is there anyone has ever faced this kind of issue before? Is this because of S3 eventual consistency?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: bharath kumar avusherla

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Aug/18 22:40

Updated:: 12/Dec/22 18:10

Resolved:: 08/Aug/18 02:14