Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
2.4.5, 3.0.0
-
None
-
None
Description
Spark Structured Streaming - Kafka integration provides the assign strategy to consume data from Kafka. This strategy assumes manual assignment of offsets in topic partitions. According to KafkaConsumer specification, the consumer group in the "assign" strategy is not used.
When creating a consumer to read data, Spark provides an internally-generated group id like this:
val uniqueGroupId = s"spark-kafka-relation-${UUID.randomUUID}"
This is done for any consumer strategy, even for "assign". The problem is that with a secured Kafka cluster a client cannot use an arbitrary consumer group id. That's why a Structured Streaming application fails with an exception like:
org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: spark-kafka-relation-ecab045d-4ee6-425e-88a0-495d4100a013-driver-0
With Spark 2.4.5, the only way is to reconfigure the broker - add the needed entries in ACL (for example, this discussion on StackOverflow).
With Spark 3.0.0, this problem could be avoided by two workarounds:
SPARK-26121: Specify a custom prefix for the consumer group id generated by Spark (allowed by the broker)SPARK-26350: Specify a custom group id
However, with the "assign" strategy the user does not need to worry about consumer group - the consumer group should be disregarded. Therefore a better fix could be to not set the consumer property "group.id" when using the "assign" strategy.