http://code.google.com/p/juniversalchardet/ has a pretty good, efficient charset decoder which is a Java port of the Mozilla universalchardet algorithms. It is licensed under Mozilla Public License Version 1.1. I am not sure if MPL is ASF compatible; it appears to be, but ianal. afaik, it does not provide detection confidence or language detection features ICU4J does and I think it has code/data files for less encodings, but it is primarily statistical so they could be added. I am also not sure what choices were made with regard to multiple encodings. In theory, it should detect what Firefox detects for a given URL/file.