One way to scale to larger on-disk data sets is to reduce the ratio between data blocks and data; that is, to make data blocks larger. Two existing parameters control for this:
- budgeted_compaction_target_rowset_size: within a given flush or compaction operation, stipulates the size of each rowset. Currently 32M.
- tablet_compaction_budget_mb: stipulates the amount of data that should be included in any given compaction. Currently 128M.
It might be interesting to explore raising these.