Affects Version/s: nutchgora, 1.6
Fix Version/s: 1.10
Patch Info:Patch Available
The default rules of URLNormalizerRegex remove the anchor up to the first
occurrence of ? or &. The remaining part of the anchor is kept
which may cause a large, possibly infinite number of outlinks when the same document
fetched again and again with different URLs,
Parameters in inner-page anchors are a common practice in AJAX web sites.
Currently, crawling AJAX content is not supported (NUTCH-1323).