Currently the queue.size in hadoop producer is 10MB. This means that the KafkaRecordWriter will hit the send button on kafka producer after the size of uncompressed queued messages becomes greater than 10MB. (The other condition on which the messages are sent is if their number exceeds SHORT.MAX_VALUE).
Considering that the server accepts a (compressed) batch of messages of sizeupto 1 million bytes minus the log overhead, we should probably reduce the queue size in hadoop producer. We should do two things:
1. change max message size on the broker to 1 million + log overhead, because that will make the client message size easy to remember. Right now the maximum number of bytes that can be accepted from a client in a batch of messages is an awkward 999988. (I don't have a stronger reason). We have set fetch size on the consumer to 1MB, this gives us a lot of room even if the log overhead increased with further versions.
2. Set the default number of bytes on hadoop producer to 1 million bytes. Anyone who wants higher throughput can override this config using kafka.output.queue.size