Samza's existing config setup is problematic for a number of reasons:
- It's completely immutable once a job starts. This prevents any dynamic reconfiguration and auto-scaling. It is debatable whether we want these feature or not, but our existing implementation actively prevents it. See SAMZA-334 for discussion.
- We pass existing configuration through environment variables. YARN exports environment variables in a shell script, which limits the size to the varargs length on the machine. This is usually ~128KB. See SAMZA-333 and
- User-defined configuration (the Config object) and programmatic configuration (checkpoints and TaskName:State mappings (see
SAMZA-123)) are handled differently. It's debatable whether this makes sense.
I'd like to keep this ticket's scope limited to just the implementation of the ConfigLog, and not re-designing how Samza's config is used in the code (SAMZA-40). We should, however, discuss how this feature would affect dynamic reconfiguration/auto-scaling.
|Integrate CoordinatorStream to use SystemConsumers and SystemProducers||Open||Unassigned|
|Optimize CoordinatorStream's bootstrap mechanism||Open||Unassigned|
|Explicit restart containers to pick up dynamic JobModel changes||Open|