Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently the metadata of a samza job is stored into a kafka topic named coordinator stream.
In samza-yarn ApplicationMaster, the same coordinator stream is read twice as a part of the startup sequence. This duplicate read unnecessarily prolongs the startup time of the application master and makes the container allocation take longer than usual. This inadvertently incurs a substantial increase in input stream processing delay depending upon the size of the coordinator stream.
To mitigate this problem, the two util classes in samza viz `ChangelogPartitionManager`, `Config` should be moved to use the MetadataStore abstraction. This ticket tracks the work involved in the migration.
Attachments
Issue Links
- links to