Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-74

Kafka mirror (corp replica): auto-discovery of topics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      The corp replica's kafka embedded consumer requires a whitelist of topics to be
      specified in its configuration. This does not scale very well as more and more
      topics are added. Instead, it can keep track of the current topics in zookeeper.
      With this approach, there should be a blacklist configuration as well if the user
      wishes to omit designated topics in the replica.

      Furthermore, the "replica/replication" terms can become confusing when we start
      working on the replication feature. So, as part of this issue, we can address this
      ambiguity as well:

      Replication vs. Mirroring:

      Kafka's roadmap includes a "replication" feature
      (https://issues.apache.org/jira/browse/KAFKA-50) that will improve its
      durability and availability guarantees. In the past, we have also used the
      term "replication" to describe the process of building a replica of a Kafka
      cluster. This is done by providing a consumer configuration when starting up
      a kafka server. The configuration should contain a parameter
      (embeddedconsumer.topics) which is a whitelist of topics that the user
      wishes to replicate. The kafka server then instantiates an embedded consumer
      to fetch the corresponding logs from the source cluster. The messages that
      the embedded consumer consumes are written to local kafka logs.

      In order to avoid any confusion between the two features going forward, I
      think it will be good to make a clearer distinction. We can call the former
      feature "replication", and the latter feature (i.e., building a replica)
      "mirroring". So, if the user provides an (embedded) consumer configuration
      to the Kafka server, then it will implicitly run as a "mirror". We can also
      improve the clarity of the related config parameters as described below.

      Config change - Default topic whitelists for mirroring:

      The embedded consumer's whitelist is currently specified as part of
      ConsumerConfig. E.g.,embeddedconsumer.topics=topic1:3,topic2:1. However, the
      common case is to mirror all topics. Therefore, it may be more convenient to
      discover topics through the source cluster's ZooKeeper, mirror all topics by
      default and provide a new blacklist configuration option. If you wish to
      mirror only a few topics, the whitelist option is still available.

      At most one of the following options can be present in the embedded
      consumer's configuration. If neither option is present, all topics will be
      mirrored.

      mirror.topics.blacklist: (topics to skip for mirroring)
      mirror.topics.whitelist: (alias for embeddedconsumer.topics, which can
      eventually be deprecated)

      Attachments

        1. svn_diff_1153618_1312409247
          47 kB
          Joel Jacob Koshy
        2. svn_diff_1153618_1312569290
          47 kB
          Joel Jacob Koshy
        3. patch_1155041_v3
          45 kB
          Joel Jacob Koshy
        4. patch_1155041_v4
          48 kB
          Joel Jacob Koshy
        5. patch_v5
          44 kB
          Joel Jacob Koshy

        Activity

          People

            junrao Jun Rao
            jjkoshy Joel Jacob Koshy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: