We're currently graphing both mean and standard deviation of datanodes from that mean, using a script that parses the output of 'dfsadmin -report'. Our DFS cluster nodes all have the same amount of disk space, so you'd expect mean of individual datanodes to be the same as % DFS full, but it's not quite the same. Haven't yet looked into why this is so.

To directly answer Konstantin's question, the one line we're using is standard deviation.

We can calculate the variance of space-usage by parsing "dfsadmin -report" but it will be good to have it in the webUI. It will also be good to expose it via a Hadoop Metric.