Description
I'm clubbing these two together as these are both important for mirroring.
(1) Multiple producers:
Shallow iteration (KAFKA-315) helps improve mirroring throughput when
messages are compressed. With shallow iteration, the mirror-maker's consumer
does not do deep iteration over compressed messages. However, when its
embedded producer sends these messages to the target cluster's brokers, the
receiving broker does deep iteration to validate the messages before
appending to the log.
In the current (pre- KAFKA-48) request handling mechanism, one producer
effectively translates to one server-side thread for handling produce
requests, so there is still a bottleneck due to decompression (due to
message validation) on the target broker.
One way to work around this is to use broker.list with multiple brokers
specified per broker. E.g.,
broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
effectively emulates multiple server-side threads. It would be better to
just add a num.producers option to the mirror-maker and instantiate that
many producers.
(2) Retries:
If the mirror-maker uses broker.list and one of the brokers is bounced for
any reason, messages can get lost. Message loss can be reduced/avoided if
the brokers are behind a VIP and if retries are supported. This option will
not work for the zk-based producer because the decision of which broker to
send to has already been made, so retries would go to the same (potentially
still down) broker. (With KAFKA-253 it would work for zk-based producers as
well, but that is only in 0.8).