Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-568

Start offset override in Task init

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: container
    • Labels:
      None

      Description

      A couple months back – on the mailing list – I mentioned a couple offset management issues I'd been having. (I'm happy to elaborate on this, but in short: I associate some extra state / ordering information with the input offsets, and there's a nontrivial performance cost keeping Samza's checkpoints and my task's state in sync.)

      It occurs to me now that there's a simple workaround for this: disable Samza's checkpointing entirely, and let `StreamTask.init` choose the starting offsets. The task can just keep its checkpoints in an ordinary StorageEngine – and by managing all the state from a single place, it's easy to keep everything in sync.

      The basic implementation actually seems fairly straightforward – the consumers are not started until after the tasks are initialized, so all we'd need to do is allow the `init` method to override the starting offsets. I've attached a small patch that exposes this through the TaskContext interface, just to illustrate the idea – if this seems like an interesting feature for Samza, I'm happy to add more tests / documentation / etc.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bkirwi Ben Kirwin
                Reporter:
                bkirwi Ben Kirwin
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: