Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1028

Eventhub spout meta data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Event hub (and Kafka) play well into event source architectures as event ingest point for later Storm processing to downstream stateful consumers.

      Advanced event stream processing, such as replaying parts of a stream, requires that the downstream consumers can synchronise different "stream runs" to their stateful view, which itself can be seen as an aggregation of all previous events. To set up the right context for re-processing the stream in a deterministic way, they need to sync their view with the incoming old data. To be able to do this, they need knowledge of the event sequenceNumber and partition.

      For example, if you have a bolt that calculates total_order_amount for a stream of orders, and emits order tuples with the total_order_amount calculated for all previous orders, replaying an order event should not change total_order_amount. I.e. orders with a higher sequenceNumber than the order being processed should not be included in total_order_amount.

      This synchronisation can be achieved if the bolt has access to the parition and sequenceNumber from eventHub.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mtandrup Mads Mætzke Tandrup
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: