Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.6.0
-
None
-
None
Description
Using this docker-compose.yml:
version: "3.7" services: kafka: image: bitnami/kafka:latest network_mode: host environment: KAFKA_ENABLE_KRAFT: yes KAFKA_CFG_PROCESS_ROLES: controller,broker KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@127.0.0.1:9093 # Set this to "PLAINTEXT://127.0.0.1:9092" if you want to run this container on localhost via Docker KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 KAFKA_CFG_NODE_ID: 1 ALLOW_PLAINTEXT_LISTENER: yes KAFKA_KRAFT_CLUSTER_ID: XkpGZQ27R3eTl3OdTm2LYA # 16 byte base64-encoded UUID
And running franz-go integration tests with KGO_TEST_RF=1, I consistently receive INVALID_RECORD errors.
Looking at the container logs, I see these problematic log lines:
2023-10-19 23:33:47,942] ERROR [ReplicaManager broker=1] Error processing append operation on partition 0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-0 (kafka.server.ReplicaManager) org.apache.kafka.common.InvalidRecordException: Invalid negative header key size -25 [2023-10-19 23:33:47,942] ERROR [ReplicaManager broker=1] Error processing append operation on partition 0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-6 (kafka.server.ReplicaManager) org.apache.kafka.common.InvalidRecordException: Reached end of input stream before skipping all bytes. Remaining bytes:94 [2023-10-19 23:33:47,942] ERROR [ReplicaManager broker=1] Error processing append operation on partition 0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-1 (kafka.server.ReplicaManager) org.apache.kafka.common.InvalidRecordException: Found invalid number of record headers -26 [2023-10-19 23:33:47,948] ERROR [ReplicaManager broker=1] Error processing append operation on partition 0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-6 (kafka.server.ReplicaManager) org.apache.kafka.common.InvalidRecordException: Found invalid number of record headers -27 [2023-10-19 23:33:47,950] ERROR [ReplicaManager broker=1] Error processing append operation on partition 0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-22 (kafka.server.ReplicaManager) org.apache.kafka.common.InvalidRecordException: Invalid negative header key size -25 [2023-10-19 23:33:47,947] ERROR [ReplicaManager broker=1] Error processing append operation on partition c63b6e30987317fad18815effb8d432b6df677d2ab56cf6da517bb93fa49b74b-25 (kafka.server.ReplicaManager) org.apache.kafka.common.InvalidRecordException: Found invalid number of record headers -50 [2023-10-19 23:33:47,959] ERROR [ReplicaManager broker=1] Error processing append operation on partition c63b6e30987317fad18815effb8d432b6df677d2ab56cf6da517bb93fa49b74b-25 (kafka.server.ReplicaManager)
I modified franz-go with a diff to print the request that was written to the wire once this error occurs. Attached is a v9 produce request. I deserialized it locally and am not seeing the corrupt data that Kafka is printing. It's possible there is a bug in the client, but again, these tests have never received this error pre-Kafka 3.6. It looks like there is either corruption when processing the incoming data, or there is some problematic race condition in the broker - I'm not sure which.