We've seen multiple deployments suffer from the fact of life that data centers don't always have uniform hardware. Often times, racks are comprised of whatever hardware we can salvage from other projects. As such, Kudu's assumption that all tablet servers should be treated equally (sans location awareness) can be a bad one.
There are a few pieces to making this better:
- Having Kudu determine the relative capacities of each tablet servers (either automatically, or as input by an operator)
- Updating the replica placement policy to account for capacity across tablet servers
- Bonus: have Kudu account for the current size used on each tablet server
Some things that might be worth considering:
- It seems reasonable to assume that each data directory is independent of one another, so we should be able to determine with relative ease the total capacity of a server by aggregating the total capacities of its data directories. This doesn't account for colocated WAL directories, but that might be a fine limitation, since we expect WAL usage to vary wildly as ingest workloads vary. The capacity could be heartbeated to masters periodically, or fetched via RPC by rebalancer tooling.
- Updating the placement policy seems trickier, since there are a lot of nice properties with using the PO2C algorithm (e.g. eventual fixing of skew), and with assuming that all tablets have equal weight (e.g. it's harder to fall into the trap of moving a replica, only to move it back). Some variant of PO2C, but based on available space instead of replica count might be worth considering for initial placement and for defining balance.