Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
The corp replica's kafka embedded consumer requires a whitelist of topics to be
specified in its configuration. This does not scale very well as more and more
topics are added. Instead, it can keep track of the current topics in zookeeper.
With this approach, there should be a blacklist configuration as well if the user
wishes to omit designated topics in the replica.
Furthermore, the "replica/replication" terms can become confusing when we start
working on the replication feature. So, as part of this issue, we can address this
ambiguity as well:
Replication vs. Mirroring:
Kafka's roadmap includes a "replication" feature
(https://issues.apache.org/jira/browse/KAFKA-50) that will improve its
durability and availability guarantees. In the past, we have also used the
term "replication" to describe the process of building a replica of a Kafka
cluster. This is done by providing a consumer configuration when starting up
a kafka server. The configuration should contain a parameter
(embeddedconsumer.topics) which is a whitelist of topics that the user
wishes to replicate. The kafka server then instantiates an embedded consumer
to fetch the corresponding logs from the source cluster. The messages that
the embedded consumer consumes are written to local kafka logs.
In order to avoid any confusion between the two features going forward, I
think it will be good to make a clearer distinction. We can call the former
feature "replication", and the latter feature (i.e., building a replica)
"mirroring". So, if the user provides an (embedded) consumer configuration
to the Kafka server, then it will implicitly run as a "mirror". We can also
improve the clarity of the related config parameters as described below.
Config change - Default topic whitelists for mirroring:
The embedded consumer's whitelist is currently specified as part of
ConsumerConfig. E.g.,embeddedconsumer.topics=topic1:3,topic2:1. However, the
common case is to mirror all topics. Therefore, it may be more convenient to
discover topics through the source cluster's ZooKeeper, mirror all topics by
default and provide a new blacklist configuration option. If you wish to
mirror only a few topics, the whitelist option is still available.
At most one of the following options can be present in the embedded
consumer's configuration. If neither option is present, all topics will be
mirrored.
mirror.topics.blacklist: (topics to skip for mirroring)
mirror.topics.whitelist: (alias for embeddedconsumer.topics, which can
eventually be deprecated)