Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14064

MirrorMaker2 stops task when record is too big

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.7.1
    • None
    • mirrormaker
    • None

    Description

      As MirrorMaker2 does currently not support shallow mirrorring (KIP-712: Shallow Mirroring), if a producer has produced using compression in one mirrorred topic, MirrorMaker2 will get the message uncompressed at some point and if not properly tuned (typically max.request.size), it may fail with a RecordTooLargeException:
      org.apache.kafka.connect.errors.ConnectException: Unrecoverable exception from producer send callback
          at org.apache.kafka.connect.runtime.WorkerSourceTask.maybeThrowProducerSendException(WorkerSourceTask.java:284)
          at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:338)
          at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:256)
          at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
          at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:238)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
          Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1049087 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.\n"
                worker_id: 'xxx.xxx.xxx.xxx:8083'

      The task is stopped and needs a manual restart.
      However, this seems to be a bit overkill because, amongst all partitions replicated by the task, only one is problematic. Stopping the replication on all partitions can make a severe impact.
      It would be more optimized to 'suspend' the partition involved and keep replication working for all remaining ones.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ddufour1a David Dufour
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: