This is a good and awaited feature, filed previously as bug hadoop-95. I vote to check it in, because as you say, it's much better than anything we have, and of critical importance.
Regarding performance, clearly the nameserver will not be overwhelmed, but the operation may take a very long time to execute. It's one thing to traverse a million entries in memory (for a modest 32TB FS), but another matter to execute a hundred thousand RPC calls from a single client. Also, when we change the open command to not return the entire list of blocks, in the interest of shortening the time of opening a file, especially when reading just a few blocks from a very large file, the implementation will need to change.
Lastly, there's extensibility. We'll want to test for things that are available only on the name server, like blocks that are not used by any file.
Wouldn't it be better to request the server to execute this code internally, and report results either to the client or to a local file?