Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
By default Kafka client automatically generated a unique client ID.
Client ID is used by many data lineage tool to gather consumer/producer (for consumer the consumer group is also used, but only client ID can be used for producer).
Setting the [client.id](https://kafka.apache.org/documentation/#producerconfigs_client.id) is options passed to Spark Kafka read or write is not possible, as it would force the same client.id on at east both the driver and the executor.
What could be done is to be able to passed Spark specific option, maybe named `clientIdPrefix`.
e.g.
```scala
val df = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribePattern", "topic.*")
.option("startingOffsets", "earliest")
.option("endingOffsets", "latest")
.option("clientIdPrefix", "my-workflow-")
.load()
```
Possible implement would be to update [InternalKafkaProducerPool](https://github.com/apache/spark/blob/master/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPool.scala#L75), or maybe in Spark `KafkaConfigUpdater` ?