[OAK-6381] Improved index analysis tools - ASF JIRA

XML

Word

Printable

JSON

It would be good to have more tools to analyze indexes:

For Lucene indexes, get a histogram of samples (terms). We have "getFieldInfo", which shows which fields are how common, but we don't have terms. For example the /oak:index/lucene index contains 1 million fulltext fields and node names for 1 million nodes, but I wonder why, and what typical nodes names are, and maybe fulltext for most nodes is actually empty. Maybe a new method "getTermHistogram(int sampleCount)" or similar
For property indexes, number of updated nodes per second or so. Right now we can just analyze the counts per key, but some indexes / keys are very volatile (see many short lived entries)
For Lucene indexes, writes per second or so (in MB).
How indexes are used (approximate read nodes / MB per hours)

is related to

OAK-9781 Lucene Index MBean getFieldTerms Excludes Results for Unique Fields