Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
it seems I'm reaching that codepath when running reassignments on my cluster and segment are deleted from remote store despite a huge retention (topic created a few hours ago with 1000h retention).
It seems to happen consistently on some partitions when reassigning but not all partitions.
My test:
I have a test topic with 30 partition configured with 1000h global retention and 2 minutes local retention
I have a load tester producing to all partitions evenly
I have consumer load tester consuming that topic
I regularly reset offsets to earliest on my consumer to test backfilling from tiered storage.
My consumer was catching up consuming the backlog and I wanted to upscale my cluster to speed up recovery: I upscaled my cluster from 3 to 12 brokers and reassigned my test topic to all available brokers to have an even leader/follower count per broker.
When I triggered the reassignment, the consumer lag dropped on some of my topic partitions:
Screenshot 2023-08-28 at 20 57 09
Later I tried to reassign back my topic to 3 brokers and the issue happened again.
Both times in my logs, I've seen a bunch of logs like:
[RemoteLogManager=10005 partition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17] Deleted remote log segment RemoteLogSegmentId
{topicIdPartition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17, id=Mk0chBQrTyKETTawIulQog}due to leader epoch cache truncation. Current earliest epoch: EpochEntry(epoch=14, startOffset=46776780), segmentEndOffset: 46437796 and segmentEpochs: [10]
Looking at my s3 bucket. The segments prior to my reassignment have been indeed deleted.
Attachments
Attachments
Issue Links
- relates to
-
KAFKA-7739 Kafka Tiered Storage
- Resolved
- links to