Currently, in drill, we allocate memory for outgoing value vectors either for max value of 64k entries or start from 4096 and keep doubling as we need more memory. Every time we double, we allocate a new vector and do a copy. We also zero fill the new half. This has performance penalty. As part of batch sizing project, based on incoming batch(es) sizing information, we are limiting number of rows in outgoing batch based on memory. Since we know the number of rows and the average size of each column in the outgoing batch, we should use that information to preallocate memory for the outgoing vectors. This will be done as each operator is being changed to adhere to produce configured batch sizes.
Another improvement that can be done is packing the value vectors as dense as possible to improve the over all memory utilization. Since we allocate memory in powers of 2, once we figure out the number of rows to include in the outgoing batch, round it down to closest power of 2 and allocate memory for that many rows.