Uploaded image for project: 'Bahir'
  1. Bahir
  2. BAHIR-223

Concern around reliability of sql-streaming-sqs

    XMLWordPrintableJSON

Details

    Description

      Looking at the source for the sql-streaming-sqs connector, it seems that we delete the messages in SQS on every fetchMaxOffset() call.

      https://github.com/apache/bahir/blob/3912360ca5bcca269a30ff42120cac46934693c4/sql-streaming-sqs/src/main/scala/org/apache/spark/sql/streaming/sqs/SqsSource.scala#L106

      My understanding of a spark streaming source is that a call to the commit() method signals that spark has completed processing up-to the given offset. Should we not delete the SQS messages on a call to commit() instead?

      Attachments

        Activity

          People

            Unassigned Unassigned
            ayushverma Ayush Verma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: