Tsz Wo said:
For internally methods like getPathComponents(..), the parameters are often not validated for performance reason
While I appreciate the importance of performance, I think in all but the very tightest loops, constant-time (O(1)) parameter validation is worth it. NN metadata ops are almost never the bottleneck of a cluster, and I can't imagine a case where an extra memory access or two would comprise a measurable portion of request latency. Given this, I think it's far more important to focus on stability, reasonable error messages, and preventing regressions due to API misuse than it is to worry about extra O(1) method calls.
I do agree that parameter validation should be happening at the outer layers of the API, but since I somehow managed to get the NPE at this point, I figured this API was somehow publicly accessible. See below.
Regd throwing a runtime exception, is it really different from NPE? Wouldn't any runtime exception in NN indicate a bug in NN?
Yes. For me, the difference in IAE vs NPE is that it's explicitly "intended" and offers some helpful explanation to the developer about what went wrong in their code. An NPE on the other hand could be an unknown internal bug, or a misuse of API, or any number of other things.
Can a user commad/usercode actually trigger this NPE? If yes, then it probably should be regular error (IOException etc) rather than a runtime exception.
You can trigger this NPE if you have a reference to the NameNode (through ClientProtocol). It looks to me like you could even trigger it by DFSClient.exists or DFSClient.getFileInfo, though I'd have to write a test case. If people think this error should be caught in either NameNode, FSNamesystem, or FSDirectory, then I agree - just let me know which place makes the most sense and I'll move the code there and write an appropriate unit test.
How did you notice this NPE?
I triggered it by calling namenode.getFileInfo() on a relative path from within my thrift contrib code (
HADOOP-4707). This was an error in my code, since I was misusing the API, but it took me some serious digging to discover that the lack of initial '/' is what caused it. If the error had been the IAE instead I would have found my error much quicker.