After several months of debugging and tuning the balancer and normalizer on a large production cluster, we found that working from visualizations of the current region state was very useful for understanding behaviors and quantifying improvements we made along the way. Specifically, we found that a chart of total assigned region count and total assigned region store files size per table per host was immensely useful for tuning the balancer. Histograms of store file size made understanding normalizer activity much more intuitive.
Our scripts would parse the output of the shell's status 'detailed' command, extract the desired metric, and produce charts. I'd like to build into the master UI the equivalent functionality, with data coming directly from the ClusterMetrics object, and data rendered into an interactive chart rendered in the browser.