Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
-
Patch Available
Description
Right now, IDN's are indexed as ASCII. An IDNNormalizer is to be used with an indexer so it will encode ASCII URL's to their proper unicode equivalant.
Attachments
Attachments
Issue Links
- depends upon
-
NUTCH-1681 In URLUtil.java, toUNICODE method does not work correctly
- Closed
- is superceded by
-
NUTCH-2746 Basic URL normalizer to normalize Unicode domain names
- Closed
- relates to
-
NUTCH-1320 IndexChecker and ParseChecker choke on IDN's
- Closed