Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-974

Build an end-of-stream concept into Samza

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 0.12.0
    • None
    • None

    Description

      Samza currently works with unbounded data sources. However, for bounded data sources like HDFS files, snapshot files which are not infinite, we need a notion of 'end-of-stream'.

      The following are the logical tasks:
      1.SystemConsumer will indicate to Samza that the end of stream has been reached for an SSP.
      2. Samza will shut down the task if all SSPs in the task are at end of stream.
      3. Samza will provide a callback to the task so that it can perform cleanups/ commits once tasks are at end of stream.
      4. Samza will shut down the container if all tasks in the container have been shut down.
      5. Samza will ultimately shut down the job if all containers in the job have been shut down.

      This is a step towards realizing a 'finite' Samza job that terminates (as opposed to an infinite stream job that keeps running) once data processing is complete.

      Attachments

        1. ProposalforEndofStreaminSamza (2).pdf
          130 kB
          Jagadish
        2. ProposalforEndofStreaminSamza.pdf
          121 kB
          Jagadish

        Issue Links

          Activity

            People

              jagadish1989@gmail.com Jagadish
              jagadish1989@gmail.com Jagadish
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: