I'm not aware of any real clusters where rescan time is an issue right now. Just to give some rough numbers, if you have 100 gigs used for HDFS cache, and blocks of size 128 MB, you'd have 800 cached blocks per DN at present. That's pretty manageable. Even with 100 nodes, you still only have 80,000 cached blocks to look at... certainly not something that would take anywhere near 30 seconds (unless a major GC hits).
There's two directions we could go in optimizing this. One is to avoid holding the lock during the entire cache rescan. This is something we talked about, but didn't quite get around to implementing since it makes some things a lot trickier. We'd have to have some way of handling the list of cached blocks changing in between times we grabbed the lock.
Another is to be more "incremental." Operations that renamed, deleted, or added files could check with the cache manager to see if their actions modified the cache values, and update the cache then. This seems easy, but is actually very difficult. We'd have to hook into almost every FSNamesystem operation. We'd also have to figure out how to make decisions "incrementally" which is not easy. I think even with an incremental system, we'd want to keep the rescan around as a backstop against any bugs in the incremental accounting. Maybe it would run much less frequently, like every 15 minutes or so.