Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-348

Configure Samza jobs through a stream



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • None
    • None


      Samza's existing config setup is problematic for a number of reasons:

      1. It's completely immutable once a job starts. This prevents any dynamic reconfiguration and auto-scaling. It is debatable whether we want these feature or not, but our existing implementation actively prevents it. See SAMZA-334 for discussion.
      2. We pass existing configuration through environment variables. YARN exports environment variables in a shell script, which limits the size to the varargs length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 for details.
      3. User-defined configuration (the Config object) and programmatic configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are handled differently. It's debatable whether this makes sense.

      In SAMZA-123, jghoman and I propose implementing a ConfigLog. This log would replace both the checkpoint topic and the existing config environment variables in SamzaContainer and Samza's YARN AM.

      I'd like to keep this ticket's scope limited to just the implementation of the ConfigLog, and not re-designing how Samza's config is used in the code (SAMZA-40). We should, however, discuss how this feature would affect dynamic reconfiguration/auto-scaling.


        1. DESIGN-SAMZA-348-1.pdf
          304 kB
          Chris Riccomini
        2. DESIGN-SAMZA-348-1.md
          45 kB
          Chris Riccomini
        3. DESIGN-SAMZA-348-0.pdf
          220 kB
          Chris Riccomini
        4. DESIGN-SAMZA-348-0.md
          30 kB
          Chris Riccomini

        Issue Links



              criccomini Chris Riccomini
              criccomini Chris Riccomini
              0 Vote for this issue
              15 Start watching this issue