Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-252

Document stream reprocessing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.7.0
    • docs
    • None

    Description

      A need with stream processing is to want to re-process prior messages at some later date. An example of this is having a stream processing job that is classifying messages in some way using a machine learning algorithm. At some point, the algorithm will be updated with a more accurate vector of weights. When this happens, usually you wish to re-process past messages to get more accurate results. Usually this is solved by running a parallel pipeline from Hadoop.

      We have thought extensively about this use case, and should document how to use Samza in a re-processing use case.

      Attachments

        1. SAMZA-252.1.patch
          9 kB
          Martin Kleppmann
        2. SAMZA-252.2.patch
          15 kB
          Martin Kleppmann

        Activity

          People

            martinkl Martin Kleppmann
            criccomini Chris Riccomini
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: