Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.0, 1.1, 1.2
-
None
-
None
-
None
Description
The usage of ApplicationDescriptor in Samza currently paints an ambiguous picture of its semantics. There is a lack of clarity of what exactly it is meant to describe.
In Standalone, app descriptor is instantiated a single time, before planning, and wraps the user's provided and subsequently rewritten configs.
In YARN, however, app descriptor is instantiated once during the planning phase of deployment, and once again on each container the job is deployed on. Additionally, what makes things even more confusing here is that before planning the app descriptor is instantiated it is with user and rewritten configs, but on container startup it is instantiated with the final set of planned configs obtained from the JobModel in the AM. This makes it difficult to draw predictable inferences about how app descriptor is used throughout the codebase because usage and behavior becomes so dependent on context.
We should have answers to these questions:
1) What "stage" of the application do we want ApplicationDescriptor to be used to describe? E.g., exclusively what the user provides (user config only, input / output stream and system descriptors, etc), or some mix between user and system provided configs (e.g. rewritten or potentially planned configs). What we decide should eventually be consistent between YARN & standalone.
2) Do we want to provide any guarantees to customers about the # of executions of SamzaApplication.describe()? Currently the # of executions is singular in standalone, and proportional the # of containers in YARN.
Attachments
Issue Links
- relates to
-
SAMZA-2280 Standalone should depend on final, planned configs as in YARN
- Open