Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.19
-
None
-
None
Description
Nutch uses LinkContentHandler for collection hyperlinks, and does not report any hyperlink for http://www.ronaldmcdonaldhouse.co.uk/ which i'll also attach to this ticket.
Debugging LinkContentHandler to print element names in startElement reveals only very few HTML elements get reported, which i think is incorrect.
Our own parser in Nutch uses a custom ContentHandler and does report many elements, including hyperlinks.