JobCoordinator / JobModelManager does not need to fetch offset for all stream partitions. It only needs the partition count for each stream in order distribute them among tasks.
The impact of fetching offsets is that when many topic partitions are being consumed, it takes longer for the Samza job to boot-up. If the yarn-am-liveness timeout is set to be lower than the time for the AM to boot up, then the RM kills the application. Such a job may never be able to start-up.
The main problem here is the generic interface in SystemAdmin - getSystemStreamMetadata for fetching partition count AND offset information. If we have separate interfaces for fetching each of these information, it will provide more granular control on fetching only required information. A similar approach was used in
SAMZA-882 to detect the partition count changes in the input streams.