Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4961

Mirrormaker crash with org.apache.kafka.common.protocol.types.SchemaException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.9.0.1
    • None
    • consumer

    Description

      We are running a cluster of 3 brokers and using mirrormaker to replicate a topic to a different 3-broker cluster. Occasionally we find that when the source cluster is under heavy load with lots of messages coming in, mirrormaker will crash with SchemaException.

      27 Mar 2017 19:02:22.030 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.NetworkClient handleTimedOutRequests(line:399) Disconnecting from node 5 due to request timeout.
      27 Mar 2017 19:02:22.032 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.NetworkClient handleTimedOutRequests(line:399) Disconnecting from node 7 due to request timeout.
      27 Mar 2017 19:02:22.033 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient onComplete(line:376) Cancelled FETCH request ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@96db60c9, request=RequestSend(header={api_key=1,api_version=1,correlation_id=76978,client_id=dx-stg02-wdc04-0}, body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=0,fetch_offset=129037541,max_bytes=1048576},{partition=1,fetch_offset=120329329,max_bytes=1048576},{partition=33,fetch_offset=125526115,max_bytes=1048576},{partition=36,fetch_offset=125526627,max_bytes=1048576},{partition=5,fetch_offset=121654333,max_bytes=1048576},{partition=37,fetch_offset=120262628,max_bytes=1048576},{partition=9,fetch_offset=125568321,max_bytes=1048576},{partition=41,fetch_offset=121593740,max_bytes=1048576},{partition=12,fetch_offset=125563836,max_bytes=1048576},{partition=13,fetch_offset=122044962,max_bytes=1048576},{partition=45,fetch_offset=125504213,max_bytes=1048576},{partition=48,fetch_offset=125506892,max_bytes=1048576},{partition=17,fetch_offset=121635934,max_bytes=1048576},{partition=49,fetch_offset=121985309,max_bytes=1048576},{partition=21,fetch_offset=125549718,max_bytes=1048576},{partition=24,fetch_offset=125548506,max_bytes=1048576},{partition=25,fetch_offset=120289719,max_bytes=1048576},{partition=29,fetch_offset=121612535,max_bytes=1048576}]}]}), createdTimeMs=1490641301465, sendTimeMs=1490641301465) with correlation id 76978 due to node 5 being disconnected
      27 Mar 2017 19:02:22.035 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch failed
      org.apache.kafka.common.errors.DisconnectException: null
      27 Mar 2017 19:02:22.037 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient onComplete(line:376) Cancelled FETCH request ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@fcb8c50d, request=RequestSend(header={api_key=1,api_version=1,correlation_id=76980,client_id=dx-stg02-wdc04-0}, body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=32,fetch_offset=125478125,max_bytes=1048576},{partition=2,fetch_offset=121280695,max_bytes=1048576},{partition=3,fetch_offset=125515146,max_bytes=1048576},{partition=35,fetch_offset=121216188,max_bytes=1048576},{partition=38,fetch_offset=121220634,max_bytes=1048576},{partition=7,fetch_offset=121634123,max_bytes=1048576},{partition=39,fetch_offset=125464566,max_bytes=1048576},{partition=8,fetch_offset=125515210,max_bytes=1048576},{partition=11,fetch_offset=121257359,max_bytes=1048576},{partition=43,fetch_offset=121571984,max_bytes=1048576},{partition=44,fetch_offset=125455538,max_bytes=1048576},{partition=14,fetch_offset=121264791,max_bytes=1048576},{partition=15,fetch_offset=125495034,max_bytes=1048576},{partition=47,fetch_offset=121199057,max_bytes=1048576},{partition=19,fetch_offset=121613792,max_bytes=1048576},{partition=20,fetch_offset=125495807,max_bytes=1048576},{partition=23,fetch_offset=121237155,max_bytes=1048576},{partition=26,fetch_offset=121249178,max_bytes=1048576},{partition=27,fetch_offset=125317927,max_bytes=1048576},{partition=31,fetch_offset=121591702,max_bytes=1048576}]}]}), createdTimeMs=1490641301466, sendTimeMs=1490641301466) with correlation id 76980 due to node 7 being disconnected
      27 Mar 2017 19:02:22.038 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch failed
      org.apache.kafka.common.errors.DisconnectException: null
      27 Mar 2017 19:02:22.039 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.NetworkClient maybeUpdate(line:619) Sending metadata request ClientRequest(expectResponse=true, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=88357,client_id=dx-stg02-wdc04-0}, body={topics=[]}), isInitiatedByNetworkClient, createdTimeMs=1490641342039, sendTimeMs=0) to node 6
      27 Mar 2017 19:02:22.040 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient onComplete(line:376) Cancelled FETCH request ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@5d530ce8, request=RequestSend(header={api_key=1,api_version=1,correlation_id=88355,client_id=dx-stg02-wdc04-0}, body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=0,fetch_offset=129037541,max_bytes=1048576},{partition=1,fetch_offset=120329329,max_bytes=1048576},{partition=33,fetch_offset=125526115,max_bytes=1048576},{partition=36,fetch_offset=125526627,max_bytes=1048576},{partition=5,fetch_offset=121654333,max_bytes=1048576},{partition=37,fetch_offset=120262628,max_bytes=1048576},{partition=9,fetch_offset=125568321,max_bytes=1048576},{partition=41,fetch_offset=121593740,max_bytes=1048576},{partition=12,fetch_offset=125563836,max_bytes=1048576},{partition=13,fetch_offset=122044962,max_bytes=1048576},{partition=45,fetch_offset=125504213,max_bytes=1048576},{partition=48,fetch_offset=125506892,max_bytes=1048576},{partition=17,fetch_offset=121635934,max_bytes=1048576},{partition=49,fetch_offset=121985309,max_bytes=1048576},{partition=21,fetch_offset=125549718,max_bytes=1048576},{partition=24,fetch_offset=125548506,max_bytes=1048576},{partition=25,fetch_offset=120289719,max_bytes=1048576},{partition=29,fetch_offset=121612535,max_bytes=1048576}]}]}), createdTimeMs=1490641342039, sendTimeMs=0) with correlation id 88355 due to node 5 being disconnected
      27 Mar 2017 19:02:22.040 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch failed
      org.apache.kafka.common.errors.DisconnectException: null
      27 Mar 2017 19:02:22.041 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient onComplete(line:376) Cancelled FETCH request ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@8a4a0afc, request=RequestSend(header={api_key=1,api_version=1,correlation_id=88356,client_id=dx-stg02-wdc04-0}, body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=32,fetch_offset=125478125,max_bytes=1048576},{partition=2,fetch_offset=121280695,max_bytes=1048576},{partition=3,fetch_offset=125515146,max_bytes=1048576},{partition=35,fetch_offset=121216188,max_bytes=1048576},{partition=38,fetch_offset=121220634,max_bytes=1048576},{partition=7,fetch_offset=121634123,max_bytes=1048576},{partition=39,fetch_offset=125464566,max_bytes=1048576},{partition=8,fetch_offset=125515210,max_bytes=1048576},{partition=11,fetch_offset=121257359,max_bytes=1048576},{partition=43,fetch_offset=121571984,max_bytes=1048576},{partition=44,fetch_offset=125455538,max_bytes=1048576},{partition=14,fetch_offset=121264791,max_bytes=1048576},{partition=15,fetch_offset=125495034,max_bytes=1048576},{partition=47,fetch_offset=121199057,max_bytes=1048576},{partition=19,fetch_offset=121613792,max_bytes=1048576},{partition=20,fetch_offset=125495807,max_bytes=1048576},{partition=23,fetch_offset=121237155,max_bytes=1048576},{partition=26,fetch_offset=121249178,max_bytes=1048576},{partition=27,fetch_offset=125317927,max_bytes=1048576},{partition=31,fetch_offset=121591702,max_bytes=1048576}]}]}), createdTimeMs=1490641342039, sendTimeMs=0) with correlation id 88356 due to node 7 being disconnected
      27 Mar 2017 19:02:22.041 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch failed
      org.apache.kafka.common.errors.DisconnectException: null
      27 Mar 2017 19:02:22.044 [mirrormaker-thread-0] FATAL kafka.tools.MirrorMaker$MirrorMakerThread fatal(line:116) [mirrormaker-thread-0] Mirror maker thread failure due to 
      org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'responses': Error reading field 'partition_responses': Error reading field 'record_set': java.lang.IllegalArgumentException
      	at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908) ~[kafka-clients-0.9.0.1.jar:?]
      	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) ~[kafka-clients-0.9.0.1.jar:?]
      	at kafka.tools.MirrorMaker$MirrorMakerNewConsumer.receive(MirrorMaker.scala:526) ~[kafka_2.11-0.9.0.1.jar:?]
      	at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:395) [kafka_2.11-0.9.0.1.jar:?]
      27 Mar 2017 19:02:22.046 [mirrormaker-thread-0] INFO  kafka.tools.MirrorMaker$MirrorMakerThread info(line:68) [mirrormaker-thread-0] Flushing producer.
      27 Mar 2017 19:02:22.046 [mirrormaker-thread-0] INFO  kafka.tools.MirrorMaker$MirrorMakerThread info(line:68) [mirrormaker-thread-0] Committing consumer offsets.
      27 Mar 2017 19:02:22.047 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient onComplete(line:376) Cancelled FETCH request ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@f37de83b, request=RequestSend(header={api_key=1,api_version=1,correlation_id=88358,client_id=dx-stg02-wdc04-0}, body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=0,fetch_offset=129037541,max_bytes=1048576},{partition=1,fetch_offset=120329329,max_bytes=1048576},{partition=33,fetch_offset=125526115,max_bytes=1048576},{partition=36,fetch_offset=125526627,max_bytes=1048576},{partition=5,fetch_offset=121654333,max_bytes=1048576},{partition=37,fetch_offset=120262628,max_bytes=1048576},{partition=9,fetch_offset=125568321,max_bytes=1048576},{partition=41,fetch_offset=121593740,max_bytes=1048576},{partition=12,fetch_offset=125563836,max_bytes=1048576},{partition=13,fetch_offset=122044962,max_bytes=1048576},{partition=45,fetch_offset=125504213,max_bytes=1048576},{partition=48,fetch_offset=125506892,max_bytes=1048576},{partition=17,fetch_offset=121635934,max_bytes=1048576},{partition=49,fetch_offset=121985309,max_bytes=1048576},{partition=21,fetch_offset=125549718,max_bytes=1048576},{partition=24,fetch_offset=125548506,max_bytes=1048576},{partition=25,fetch_offset=120289719,max_bytes=1048576},{partition=29,fetch_offset=121612535,max_bytes=1048576}]}]}), createdTimeMs=1490641342041, sendTimeMs=0) with correlation id 88358 due to node 5 being disconnected
      27 Mar 2017 19:02:22.047 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch failed
      org.apache.kafka.common.errors.DisconnectException: null
      27 Mar 2017 19:02:22.048 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient onComplete(line:376) Cancelled FETCH request ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@10fcee02, request=RequestSend(header={api_key=1,api_version=1,correlation_id=88359,client_id=dx-stg02-wdc04-0}, body={replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=logging,partitions=[{partition=32,fetch_offset=125478125,max_bytes=1048576},{partition=2,fetch_offset=121280695,max_bytes=1048576},{partition=3,fetch_offset=125515146,max_bytes=1048576},{partition=35,fetch_offset=121216188,max_bytes=1048576},{partition=38,fetch_offset=121220634,max_bytes=1048576},{partition=7,fetch_offset=121634123,max_bytes=1048576},{partition=39,fetch_offset=125464566,max_bytes=1048576},{partition=8,fetch_offset=125515210,max_bytes=1048576},{partition=11,fetch_offset=121257359,max_bytes=1048576},{partition=43,fetch_offset=121571984,max_bytes=1048576},{partition=44,fetch_offset=125455538,max_bytes=1048576},{partition=14,fetch_offset=121264791,max_bytes=1048576},{partition=15,fetch_offset=125495034,max_bytes=1048576},{partition=47,fetch_offset=121199057,max_bytes=1048576},{partition=19,fetch_offset=121613792,max_bytes=1048576},{partition=20,fetch_offset=125495807,max_bytes=1048576},{partition=23,fetch_offset=121237155,max_bytes=1048576},{partition=26,fetch_offset=121249178,max_bytes=1048576},{partition=27,fetch_offset=125317927,max_bytes=1048576},{partition=31,fetch_offset=121591702,max_bytes=1048576}]}]}), createdTimeMs=1490641342041, sendTimeMs=0) with correlation id 88359 due to node 7 being disconnected
      27 Mar 2017 19:02:22.048 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher onFailure(line:144) Fetch failed
      org.apache.kafka.common.errors.DisconnectException: null
      27 Mar 2017 19:02:22.049 [mirrormaker-thread-0] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator handle(line:518) Committed offset ...
      27 Mar 2017 19:02:22.053 [mirrormaker-thread-0] INFO  kafka.tools.MirrorMaker$MirrorMakerThread info(line:68) [mirrormaker-thread-0] Shutting down consumer connectors.
      

      All the brokers and mirrormaker are running 0.9.0.1, there are 50~100 clients that are producing messages in the source cluster, some of them are using the 0.9 client and some are on 0.8.

      We have process that auto restart mirrormaker after 30s when it crashes but it simply crashes again with the same error within 1~10 min. The only solution we found so far is to wait for a few hrs for the source cluster load to go down and then the mirror would work again. Although there is still massive number of messages in the source cluster waiting to be replicated, mirrormaker seems to be handling those just fine.

      We also saw the DisconnectException whenever the SchemaException occurs, not sure whether they are related though.

      Attachments

        Activity

          People

            Unassigned Unassigned
            shangdi Di Shang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: