(btw, I previously attempted to imply my neg was withdrawn, but I just want to make it unambiguously clear. I really like the idea as evidenced by my jira dupped to this one)
From googling it looked like "jdk1.6.0" was a stable path on Solaris, ie no glob necessary. Maybe someone who uses Solaris can verify, I'm fine punting Solaris to a future change as well.
My only point/concern is that if a jdk1.6.1. or jdk1.6.2 is installed that this implementation would still default to 1.6.0. What if a 1.6.1, but not 1.6.0, is installed? Or if only jdk1.7.* is installed and perhaps hadoop works fine with either java 6 or 7? The detection is of little use to the user, however I don't know the Solaris conventions. Maybe they have a way to set a "default" java? Perhaps a symlink somewhere? I have no strong feelings, and punting would be fine, or maybe a Solaris user could chime in.
These globs should not result in a lot more paths being added. Eg how many paths would you expect "/usr/java/jdk1.6*" to match on most systems? Probably none, a couple would be a lot right? Even if these globs matched 20 paths per above it shouldn't impact performance.
Don't worry, I have no performance concerns. I fully agree a few stat calls is minuscule in the overall execution. My mild concern is the maintainability introduced by having to update paths for newer versions of java. However, being a stickler for consistency, I have a little more concern about how updating the paths introduces inconsistent behavior in how the jdk is selected. Then again, perhaps Linux folks are accustomed to inconsistency?
bq. Linux - Same as SunOS. Also, why aren't /usr/java/default and /usr/lib/jvm/default-java checked first?
The code looks for Sun Java 6 first because Hadoop depends on Sun Java 6, ie we don't necessarily want the system default Java.
Only for the sake of discussion: If the goal is be a "good citizen" of the host os, then picking the system default is sufficient. This approach should eliminate the need to periodically update the script to pick the newer "preferred" java versions, and remove the inconsistency in how the java version is selected between hadoop releases. It would parallel the Mac/Darwin detection, ie. use the system's default method (if there is one), and if the user wants a different version then they must either change the system default or explicitly set JAVA_HOME.