Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8177

Allow for separate connect instances to have sink connectors with the same name

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • connect

    Description

      If you have multiple Connect instances (either a single standalone or distributed group of workers) running against the same Kafka cluster, the connect instances cannot each have a sink connector with the same name and still operate independently. This is because the consumer group ID used internally for reading from the source topic(s) is entirely derived from the connector's name: https://github.com/apache/kafka/blob/d0e436c471ba4122ddcc0f7a1624546f97c4a517/connect/runtime/src/main/java/org/apache/kafka/connect/util/SinkUtils.java#L24

      The documentation of Connect implies to me that it supports "multi-tenancy," that is, as long as...

      • In standalone mode, the offset.storage.file.filename is not shared between instances
      • In distributed mode, group.id and config.storage.topic, offset.storage.topic, and status.storage.topic are not the same between instances

      ... then the connect instances can operate completely independently without fear of conflict.  But the sink connector consumer group naming policy makes this untrue. Obviously this can be achieved by uniquely naming connectors across instances, but in some environments that could be a bit of a nuisance, or a challenging policy to enforce. For instance, imagine a large group of developers or data analysts all running their own standalone Connect to load into a SQL database for their own analysis, or replicating to mirroring to their own local cluster for testing.

      The obvious solution is allow supplying config that gives a Connect instance some notion of identity, and to use that when creating the sink task consumer group. Distributed mode already has this obviously (group.id), but it would need to be added for standalone mode. Maybe instance.id? Given that solution it seems like this would need a small KIP.

      I could also imagine this solving this problem through better documentation ("ensure your connector names are unique!"), but having that subtlety doesn't seem worth it to me. (Optionally) assigning identity to every Connect instance seems strictly more clear, without any downside.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pgwhalen Paul Whalen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: