Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.1, 3.3.2
-
None
Description
Hi, Team:
Nice to meet you!
In our business, we found two types of issue which need to improve:
(1) Take much time to send the first message
Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time.
So, we try to find one solution to improve it.
After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction.
(2) can't reduce the send message's block time to wanted value.
Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables:
where to block |
when it is blocked |
how long it will be blocked? |
org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata | the first request which need to load the metadata from kafka | <max.block.ms |
org.apache.kafka.clients.producer.internals.RecordAccumulator#append | at peak time for business, if the network can’t send message in short time. | <max.block.ms |
What's the solution for the above two issues:
I think about current logic and figure out followed possible solution:
(1) send one "warmup" message, thus we can't send any fake message.
(2) provide one extra configure time configure which dedicated for getting metadata. thus it may break the define for the max.block.ms a little. what's more, it only solves issue 2 instead of issue1.
(3) add one method to call waitOnMetadata with one timeout setting without using the max.block.ms (PR: KAFKA-14768: provide new method to warmup first record's sending and reduce the max.block.ms safely by jiafu1115 · Pull Request #13320 · apache/kafka (github.com))
note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata
ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs)
__
after the change, we can call it before the service is marked as ready. After the ready. it won't block to get metadata due to cache. And then we can be safe to reduce the max.block.ms to a lower value to reduce thread's blocking time.
After adopting the solution 3. we solve the above issues. For example, we reduce the first message's send about 4s seconds. The log can refer to followed:
warmup test_topic at phase phase 2: get metadata from mq start
warmup test_topic at phase phase 2: get metadata from mq end consume 4669ms
And after the change, we reduce the max.block.ms from 10s to 2s without worry can't get metadata.
So what's your thought for these two issues and the solution I proposed. I hope to get your feedback and thought for the issues.