[KUDU-2204] Revamp storage of tablet server metadata - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: consensus, tserver
Labels:
- roadmap-candidate

Description

Since the very early days of Kudu we've used the file system as a sort of key-value store for various bits of metadata – in particular, each tablet has a "tablet metadata" file and a "consensus metadata file". We also have various per-disk metadata, as well as LBM block metadata, but I'll leave those out of the present discussion.

Although the file system "works" and got us pretty far, there are various benefits we could gain by switching to something closer to a KV-store:

write performance: writing a single file per "record", fsyncing it, and renaming it to replace the old version is pretty slow. File systems empirically do a poor job of group-commit of multiple of these types of operations happening at the same time, especially as the disks get busy
read performance: on startup, we need to read all these metadata files, and using separate files means it's likely they're on entirely separate areas of disk. If we have, for example, 100k tablets in the future, we'll pay 100k random reads, which could take 10+ minutes on a cold hard disk.
scalability: if we want to get to 100k+ tablets per server, we start entering the territory where file systems are unhappy with such a high quantity of files

Solving the performance problem also opens us up a lot of headroom to do things such as replicate cmeta information for each tablet onto two or three disks, which has a lot of benefits for handling disk-failure. (you can lose a disk without getting "amnesia" of a raft vote, for example).

Attachments

Issue Links

is duplicated by

KUDU-2174 More efficient storage for ConsensusMetadata

Resolved

is related to

KUDU-2195 Enforce durability happened before relationships on multiple disks

Open

KUDU-2117 Spread metadata across multiple data directories

Open

KUDU-2708 Possible contention creating temporary files while flushing cmeta during an election storm

Open

relates to

KUDU-2278 Improve IO for writing deltas

Open

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Oct/17 07:29

Updated:: 02/Jun/20 20:54