Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-459

Explicit flush for individual output streams

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.9.0
    • None
    • container
    • None

    Description

      From the mailing list:

      http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3CCACuX-D8-CS7867ob47fqytCAdvGURc4owv82Rhg2oEJYmr8hpg%40mail.gmail.com%3E

      At the moment, the only way to trigger a flush of the output streams is to call TaskCoordinator.commit, which also flushes the state and saves the checkpoints. There are a few cases where more granularity would be useful: writing out a single stream can be much faster than doing a full commit, and if a user cares about the order in which messages are published, they can disable the autocommit and trigger flushes manually.

      I'd anticipate this to look something like TaskCoordinator.flush(systemStream). It looks like the TaskCoordinator normally only queues up work, instead of doing it synchronously – if that's the case, it should be enough to buffer up all the requested flushes, then perform them in order when the moment comes.

      Note: you could get almost the same effect by switching to a synchronous system and letting the user send a batch of messages all at once, much as the underlying Kafka client does. This woudn't let you flush a changelog stream, though.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            bkirwi Ben Kirwin

            Dates

              Created:
              Updated:

              Slack

                Issue deployment