I can't really post my client code since it's intertwined with a bunch of other stuff, but I extracted the important parts into a junit test that i attached to this issue. We run java (tomcat) so it's fairly easy to talk directly to hbase and integrate a few features into our admin console. Printing friendly record names rather than escaped bytes, triggering backups, moving regions, etc... Don't think it requires knowing the keyspace ahead of time, just that you hash into a known output range, a 63 bit long in my example.
I think the consistent hashing scheme may be a good out-of-the-box methodology. Even with something smarter, I'd worry about the underlying algorithms getting off course and starting a death spiral as bad outputs are fed back in creating even worse outputs. Something like consistent hashing could be a good beacon to always be steering towards so things don't get too far off course.
I have about 20 tables with many different access patterns and I can't envision an algorithm that balances them truly well. Everything could be going fine until I kick off a MR job that randomly digs up 100 very cold regions and find that they're all on the same server.
I'm thinking of a system where each region is either at home (its consistent hash destination) or visiting another server because the balancer decided its home was too hot. Each regionserver could identify it's hotter regions, and the balancer could move these around in an effort to smooth out the load. In the mean time, colder regions would stay well distributed based on how good the hashing mechanism is. If a regionserver cools down, the master brings home it's vacationing regions first, and if it's still cool, then it borrows someone else's hotter home regions. Without an underlying scheme, I can envision things getting extremely chaotic, especially with regards to cold regions of a single table getting bundled up since they're being overlooked. With this method, you're never too far from safely hitting the reset button.
Regarding your comment about moving the top or bottom child off the parent server after a split, I tend to prefer moving the bottom one. With time series data it will keep writing to the bottom child, so if you don't move the bottom child then a single server will end up doing the appending forever. I prefer to rotate the server that's doing the work even though it's not quite as efficient and may cause a longer split pause.... makes for a more balanced cluster.