Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-334

Need for asymmetric container config



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.0
    • None
    • container
    • None


      The current (and upcoming) partitioning scheme(s) suggest that there might be a skew in the amount of data ingested and computation performed across different containers for a given Samza job. This directly affects the amount of resources required by a container - which today are completely symmetric.

      Case A] Partitioning on Kafka partitions
      For instance, consider a partitioner job which reads data from different Kafka topics (having different partition layouts). In this case, its possible that a lot of topics have a smaller number of Kafka partitions. Consequently the containers processing these partitions would need more resources than those responsible for the higher numbered partitions.

      Case B] Partitioning based on Kafka topics
      Even in this case, its very easy for some containers to be doing more work than others - leading to a skew in resource requirements.

      Today, the container config is based on the requirements for the worst (doing the most work) container. Needless to say, this leads to resource wastage. A better approach needs to consider what is the true requirement per container (instead of per job).


        Issue Links



              Unassigned Unassigned
              cpsoman Chinmay Soman
              0 Vote for this issue
              10 Start watching this issue