Currently the only way to control the prefetch buffer is count based using the systems.system-name.samza.fetch.threshold configuration. However, in the presence of variable sized messages this make it very hard to allocate determisnistic memory resources to a SamzaContainer.
This JIRA is a for an improvment to allow tuning the prefetch buffer based on bytes as well using a new config like systems.source.samza.fetch.bytes. When this is present the threshold could be safely ignored.
This is an extremely important feature for us as it allowed us to stabilize our platform as we have deterministic allocations.
I have a patch of this against 0.9.1 running in canaries at scale in prod and is looking promising so far.