[KUDU-3028] Prefer running concurrent flushes/compactions on different data directories, if possible - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: tserver
Labels:

Description

In a Kudu cluster with tablet servers having 9 directories each backed by a separate HDD (spinning disks) and 3 maintenance manager threads, I noticed long period (2 hours or so) of 100% IO saturation of first one drive, and then a long period of 100% IO saturation of another drive.

I noticed that all 3 maintenance threads were hammering the same data directory for a long time (and that was the reason of 100% IO saturation on the backing drive). Then they switched do other data directory, saturating the IO there. That lead to extremes like tens of seconds waiting for fsync to complete. In case if higher number of data directories and higher number of maintenance threads that may become even more extreme.

W1218 12:10:04.712692 247413 env_posix.cc:889] Time spent sync call for /data/6/kudu/tablet/data/data/4b1f42243784484b85a57255c88d8b93.metadata: real 27.245s      user 0.000s     sys 0.000s
W1218 12:10:04.712724 247412 env_posix.cc:889] Time spent sync call for /data/6/kudu/tablet/data/data/128658789b56415b82becf42f34c4af1.metadata: real 27.244s      user 0.000s     sys 0.000s
W1218 12:11:22.690099 247411 env_posix.cc:889] Time spent sync call for /data/6/kudu/tablet/data/data/ad4c53b4e230488899f55e6580c070af.data: real 15.357s  user 0.000s     sys 0.000s

W1218 14:17:30.151391 247412 env_posix.cc:889] Time spent sync call for /data/3/kudu/tablet/data/data/165c86d614c54f9f8bfaf01361ceca16.data: real 10.674s       user 0.000s     sys 0.000s
W1218 14:17:30.151448 247413 env_posix.cc:889] Time spent sync call for /data/3/kudu/tablet/data/data/820354e482be40f9858b29484c2db5c6.metadata: real 11.807s   user 0.000s     sys 0.000s
W1218 14:17:30.151460 247411 env_posix.cc:889] Time spent sync call for /data/3/kudu/tablet/data/data/483a57ac212544f3b39cbe887bf16946.metadata: real 23.472s   user 0.000s     sys 0.000s

It would be nice to schedule compactions and flushes to be spread between available directories, if possible.

Also, it would be great to establish a limit of concurrent compactions/flushes per one data directory, so even in case of higher number of data directories it will be possible to prevent hammering one data directory by all the flushing/compacting threads.

Another approach might be switching from multi-directory structure to some volume-based approach where the filesystem or a controller takes care of fanning out the IO to multitude of drives backing the volume.

Attachments

Issue Links

is related to

KUDU-1952 round-robin block allocation can place all blocks for a given column on one disk

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Alexey Serbin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Dec/19 00:31

Updated:: 19/Dec/19 01:42