Details
Description
URLUtil.toUNICODE() fails on IDNs and returns null instead of the Unicode URL. The constructor of URI obviously does not accept IDN host names. For http://www.xn--evir-zoa.com/ the constructor IDN() throws the exception:
java.net.URISyntaxException: Illegal character in hostname at index 11: http://www.çevir.com/
Principally, IDN.toUnicode() can convert URLs (not only domain or host names). However, it does not convert URLs with host part consisting of only two parts: http://xn--uni-tbingen-xhb.de/. Is that the reason why we need URLUtil.toUNICODE() ?
Attachments
Attachments
Issue Links
- duplicates
-
NUTCH-1681 In URLUtil.java, toUNICODE method does not work correctly
- Closed