Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-459

Explicit flush for individual output streams

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.9.0
    • None
    • container
    • None

    Description

      From the mailing list:

      http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3CCACuX-D8-CS7867ob47fqytCAdvGURc4owv82Rhg2oEJYmr8hpg%40mail.gmail.com%3E

      At the moment, the only way to trigger a flush of the output streams is to call TaskCoordinator.commit, which also flushes the state and saves the checkpoints. There are a few cases where more granularity would be useful: writing out a single stream can be much faster than doing a full commit, and if a user cares about the order in which messages are published, they can disable the autocommit and trigger flushes manually.

      I'd anticipate this to look something like TaskCoordinator.flush(systemStream). It looks like the TaskCoordinator normally only queues up work, instead of doing it synchronously – if that's the case, it should be enough to buffer up all the requested flushes, then perform them in order when the moment comes.

      Note: you could get almost the same effect by switching to a synchronous system and letting the user send a batch of messages all at once, much as the underlying Kafka client does. This woudn't let you flush a changelog stream, though.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bkirwi Ben Kirwin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: