[KUDU-1967] Umbrella JIRA for node density improvements - ASF JIRA

XML

Word

Printable

JSON

For the Kudu 1.4 release, I'll be working to improve node density.

Here's a brief primer on Kudu's scalability targets today:

We recommend no more than 4 TB of total data per node. This is specific to Kudu data blocks, so this data is post-encoding and post-compression.
We recommend no more than 1000 partitions (post-replication) per node.
We recommend no more than 100 nodes per cluster.
We recommend no more than 60 partitions per table per tserver.

For 1.4, here's what we'd like to achieve:

Up to 16 TB of total data per node. Maybe even 48 TB, if possible.
Up to 100 "hot" partitions per node. In this context, "hot" means partitions that are actively servicing writes.
Thousands of "cold" partitions per node. Put another way, it should be drastically cheaper to serve "cold" partitions than it is today.
Maintain the "100 nodes per cluster" limit.
Remove the "no more than 60 partitions per table per node" limit.

I'll be linking various interesting JIRAs into this one, and I'll document, for each one, which aspect of data scalability it affects.

1.	Integration test for data scalability	Resolved	Adar Dembo
2.	Tablet server runs out of threads when creating lots of tablets	Resolved	Adar Dembo
3.	Reduce Kudu WAL log disk usage	Resolved	Todd Lipcon
4.	GetTableSchema() is O(n) in the number of tablets	Resolved	Adar Dembo
5.	Reduce impact of enabling fsync on the master	Resolved	Adar Dembo
6.	LBM should log startup progress periodically	Resolved	Samuel Okrent
7.	LBM should start up faster	Resolved	Adar Dembo
8.	bootstrap should not replay logs that are known to be fully flushed	Open	Andrew Wong
9.	Explore reducing number of data blocks by tuning existing parameters	Open	Unassigned
10.	Explore ways to reduce maintenance manager CPU load	Open	Unassigned
11.	Coalesce RPCs destined for the same server	Open	Ashwani Raina
12.	Improve web UI experience with many tablets	Resolved	William Berkeley
13.	Skip entire WAL files or sections of WAL files that do not need to be replayed	Open	Unassigned