[SPARK-38715] Would be nice to be able to configure a client ID pattern in Kafka integration - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Structured Streaming
Labels:
None

Description

By default Kafka client automatically generated a unique client ID.
Client ID is used by many data lineage tool to gather consumer/producer (for consumer the consumer group is also used, but only client ID can be used for producer).

Setting the [client.id](https://kafka.apache.org/documentation/#producerconfigs_client.id) is options passed to Spark Kafka read or write is not possible, as it would force the same client.id on at east both the driver and the executor.

What could be done is to be able to passed Spark specific option, maybe named `clientIdPrefix`.

e.g.

```scala
val df = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribePattern", "topic.*")
.option("startingOffsets", "earliest")
.option("endingOffsets", "latest")
.option("clientIdPrefix", "my-workflow-")
.load()
```

Possible implement would be to update [InternalKafkaProducerPool](https://github.com/apache/spark/blob/master/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPool.scala#L75), or maybe in Spark `KafkaConfigUpdater` ?

Attachments

Issue Links

links to

[Github] Pull Request #36030 (cchantep)

Activity

People

Assignee:: Unassigned

Reporter:: Cédric Chantepie

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/Mar/22 13:48

Updated:: 31/Mar/22 20:52