Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.0
-
None
-
None
Description
The current (and upcoming) partitioning scheme(s) suggest that there might be a skew in the amount of data ingested and computation performed across different containers for a given Samza job. This directly affects the amount of resources required by a container - which today are completely symmetric.
Case A] Partitioning on Kafka partitions
For instance, consider a partitioner job which reads data from different Kafka topics (having different partition layouts). In this case, its possible that a lot of topics have a smaller number of Kafka partitions. Consequently the containers processing these partitions would need more resources than those responsible for the higher numbered partitions.
Case B] Partitioning based on Kafka topics
Even in this case, its very easy for some containers to be doing more work than others - leading to a skew in resource requirements.
Today, the container config is based on the requirements for the worst (doing the most work) container. Needless to say, this leads to resource wastage. A better approach needs to consider what is the true requirement per container (instead of per job).
Attachments
Issue Links
- is related to
-
SAMZA-336 Dynamic load balancing
- Open