We are trying to upgrade kafka cluster from Kafka 0.11.0.1 to Kafka 1.0.1. After upgrading 1 node on the cluster, we notice that network threads use most of the cpu. It is a 3 node cluster with 15k messages/sec on each node. With Kafka 0.11.0.1 typical usage of the servers is around 50 to 60% vcpu(using less than 1 vcpu). After upgrade we are noticing that cpu usage is high depending on the number of network threads used. If networks threads is set to 8, then the cpu usage is around 850%(9 vcpus) and if it is set to 4 then the cpu usage is around 450%(5 vcpus). Using the same kafka server.properties for both.
Did further analysis with git bisect, couple of build and deploys, traced the issue to commit 47ee8e954df62b9a79099e944ec4be29afe046f6. CPU usage is fine for commit f15cdbc91b240e656d9a2aeb6877e94624b21f8d. But with commit 47ee8e954df62b9a79099e944ec4be29afe046f6 cpu usage has increased. Have attached screenshots of profiling done with both the commits. Screenshot Commit-f15cdbc91b-profile shows less cpu usage by network threads and Screenshots Commit-47ee8e954-profile and Commit-47ee8e954-profile2 show higher cpu usage(almost entire cpu usage) by network threads. Also noticed that kafka.network.Processor.poll() method is invoked 10 times more with commit 47ee8e954df62b9a79099e944ec4be29afe046f6.
We need the issue to be resolved to upgrade the cluster. Please let me know if you need any additional information.