[KAFKA-12900] JBOD: Partitions count calculation does not take into account topic name - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.8.0
Fix Version/s: None
Component/s: core, jbod
Labels:
None

Description

In ~~KAFKA-188~~ multiple data directories support was implemented. New partitions are spread to multiple log dirs based on partitions count calculation, log dir with least partitions count is selected as next dir.
The problem exists because we do not take into account topic names when we do such calculations. As a result some "fat" partitions can be located on fewer disks than they should be.

Example:
Fat topic "F" with partitions: F1, F2, ... , F6
Thin topic "t" with partitions: t1, t2, ... , t6
Log dirs on broker: dir1, dir2, dir3

What we have now in some cases:
dir1: t1 t2 t4 t6
dir2: F1 F3 F4 F5
dir3: F2 t3 t5 F6

There is a skew but in terms of partition calculation it is "balanced" because all of the log dirs have the same partition count.

It would be better if we count partitions in all log dirs for the current topic which partition is going to be written. And then log dir with least partitions count for that topic should be the next one. As a result partitions from example above could be spread like this:
dir1: t1 F1 t6 F6
dir2: F2 t2 t4 F4
dir3: F3 t3 t5 F5

In my case there will be no skew because the producer's partitioner is "round robin" by default and partition sizes are the same.

I've prepared a patch, please check it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

KAFKA-12900.patch
06/Jun/21 00:01
1 kB
Georgy

Activity

People

Assignee:: Unassigned

Reporter:: Georgy

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Jun/21 00:00

Updated:: 06/Jun/21 00:03