Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.2.0
-
None
-
None
Description
We have noticed this behaviour whilst testing console producer against a kafka service installed on GCP. We have been using a fork from confluent Helm Chart.
https://github.com/helm/charts/tree/master/incubator/kafka
FYI - we've used cp 5.3.0 with Apache Kafka 2.3.0
Our VPN connection throughput was 1 mbps. Upon connecting to VPN, we opened a console producer client (2.2.0) with the following command:
kafka-console-producer.bat --topic some_topic --broker-list gcp_broker1:19092,gcp_broker2:19092,gcp_broker3:19092
Similarly, we ran a consumer with the following command before publishing messages
kafka-console-consumer.bat --topic some_topic --bootstrap-server gcp_broker1:19092,gcp_broker2:19092,gcp_broker3:19092
For producer console, we did receive a carat (>) prompt for publishing, so we entered messages:
>one >two >three >
After a while, it responded with NETWORK_EXCEPTION
[2019-08-02 11:17:19,690] WARN [Producer clientId=console-producer] Got error produce response with correlation id 8 on topic-partition some_topic-0, retrying (2 attempts left). Error: NETWORK_EXCEPTION (org.apache.kafka.clients.producer.internals.Sender)
We then hit "Enter" and received a carat (>) back
[2019-08-02 11:17:19,690] WARN [Producer clientId=console-producer] Got error produce response with correlation id 8 on topic-partition some_topic-0, retrying (2 attempts left). Error: NETWORK_EXCEPTION (org.apache.kafka.clients.producer.internals.Sender) >
Immediately after that, on consumer window, we received the following:
three two one
We ran the same exercise from a regular network (wifi/lan) and didn't see this issue (i.e. works as described on Quickstart).
This is slightly concerning for us since tunneling into a VPN shouldn't have any impact (or, should it) how kafka message protocol works over tcp. It seems that Kafka couldn't guarantee order of messages when network latency is involved.
FYI
1) We tried on VPN with --request_timeout_ms 120000 and still same results.
2) Our setup was 3 node (3 br, 3 zk) with every topic having 1 partition only (RF - 3).