Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-334

Need for asymmetric container config

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: container
    • Labels:
      None

      Description

      The current (and upcoming) partitioning scheme(s) suggest that there might be a skew in the amount of data ingested and computation performed across different containers for a given Samza job. This directly affects the amount of resources required by a container - which today are completely symmetric.

      Case A] Partitioning on Kafka partitions
      For instance, consider a partitioner job which reads data from different Kafka topics (having different partition layouts). In this case, its possible that a lot of topics have a smaller number of Kafka partitions. Consequently the containers processing these partitions would need more resources than those responsible for the higher numbered partitions.

      Case B] Partitioning based on Kafka topics
      Even in this case, its very easy for some containers to be doing more work than others - leading to a skew in resource requirements.

      Today, the container config is based on the requirements for the worst (doing the most work) container. Needless to say, this leads to resource wastage. A better approach needs to consider what is the true requirement per container (instead of per job).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                cpsoman Chinmay Soman
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: