[KAFKA-4547] Consumer.position returns incorrect results for Kafka 0.10.1.0 client - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.10.0.2, 0.10.1.0, 0.10.1.1
Fix Version/s: 0.10.2.0
Component/s: clients
Labels:
- clients
Environment:
Windows Kafka 0.10.1.0

Description

Consider the following code -

KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
List<TopicPartition> listOfPartitions = new ArrayList();
for (int i = 0; i < consumer.partitionsFor("IssueTopic").size(); i++)

{ listOfPartitions.add(new TopicPartition("IssueTopic", i)); }

consumer.assign(listOfPartitions);
consumer.pause(listOfPartitions);
consumer.seekToEnd(listOfPartitions);
// consumer.resume(listOfPartitions); – commented out
for(int i = 0; i < listOfPartitions.size(); i++)

{ System.out.println(consumer.position(listOfPartitions.get(i))); }

I have created a topic IssueTopic with 3 partitions with a single replica on my single node kafka installation (0.10.1.0)

The behavior noticed for Kafka client 0.10.1.0 as against Kafka client 0.10.0.1

A) Initially when there are no messages on IssueTopic running the above program returns
0.10.1.0
0
0
0

0.10.0.1
0
0
0

B) Next I send 6 messages and see that the messages have been evenly distributed across the three partitions. Running the above program now returns
0.10.1.0
0
0
2

0.10.0.1
2
2
2

Clearly there is a difference in behavior for the 2 clients.

Now after seekToEnd call if I make a call to resume (uncomment the resume call in code above) then the behavior is

0.10.1.0
2
2
2

0.10.0.1
2
2
2

This is an issue I came across when using the spark kafka integration for 0.10. When I use kafka 0.10.1.0 I started seeing this issue. I had raised a pull request to resolve that issue ~~SPARK-18779~~ but when looking at the kafka client implementation/documentation now it seems the issue is with kafka and not with spark. There does not seem to be any documentation which specifies/implies that we need to call resume after seekToEnd for position to return the correct value. Also there is a clear difference in the behavior in the two kafka client implementations.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

issuerep.zip
23/Dec/16 08:15
7 kB
Pranav Nakhe

Issue Links

is duplicated by

KAFKA-4845 KafkaConsumer.seekToEnd cannot take effect when integrating with spark streaming

Resolved

is related to

SPARK-18057 Update structured streaming kafka from 0.10.0.1 to 2.0.0

Resolved

links to

GitHub Pull Request #2341

GitHub Pull Request #2415

GitHub Pull Request #2431

Activity

People

Assignee:: Vahid Hashemian

Reporter:: Pranav Nakhe

Reviewer:: Jason Gustafson

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/Dec/16 09:19

Updated:: 07/Mar/17 15:56

Resolved:: 20/Jan/17 01:03