Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-42

Add a job setup phase to Samza



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6.0
    • None
    • container
    • None


      We have several use cases for doing things once at the beginning of a Samza job's execution (before containers start). Examples:

      • Validate or create checkpoint topic (if using KafkaCheckpointManager)
      • Validate or create state topic (if using LoggedStore)

      Right now, we have to do this in the container, which means that there's a race condition when running on YARN, as each container will try to create the same topic.

      Initially, I thought this logic could be put in the YARN AM, but then we'd have to put corresponding logic in the LocalJobFactory. This gets problematic if we implement SAMZA-41, since there would no longer be a central place to do a "before job starts" operation with the LocalJobFactory. If we don't do SAMZA-41, then we should be fine putting this logic in the YARN AM and LocalJobFactory.

      Alternatively, we could put this logic in JobRunner. One downside to this is that it would mean the JobRunner would need full access to the grid that it was trying to execute on (not just the RM) so that it could talk to Kafka/ZooKeeper (for example). I think this is actually fine, since we always execute our jobs from a spot that has access to the full grid.


        Issue Links



              Unassigned Unassigned
              criccomini Chris Riccomini
              0 Vote for this issue
              3 Start watching this issue