On large clusters, the NameNode can become a performance bottleneck. The NameNode is also a single-point of failure. Recent improvements to HDFS to support High Availability and Federation [See ACCUMULO-118] help address these issues, but at greater administrative costs and specialized hardware.
We have seen demonstrations of using HBase to host a NameNode. There's Aaron Cordova's example of a Distributed Name Node:
Design for a Distributed Name Node
Dynamic Namespace Partitioning with Giraffa File System
We could incrementally implement a self-hosted Accumulo, which would run as its own NameNode. This would be useful for large Accumulo installations. Over the long term, we could incorporate all NameNode functions to provide a scalable, distributed NameNode for other large Hadoop installations.
Hopefully the approach used could be trivially ported to HBase as well.