When working with Mahout text clustering I find that I keep writing code similar to the contents of
public static String getTopFeatures(Cluster cluster, String dictionary, int numTerms)
in ClusterDumper in order to determine cluster labels.
I think it would be useful if (parts of) this code are added to the cluster or vector API so that you could do something like
Cluster cluster = ... // get the cluster from seq file iterable
String clusterLabel = cluster.getTopTerms(1, dictionary); // Do something with the label
I think this would make it easier to export and post-process clustering results, like indexing or storing them elsewhere.