This Jira has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems email firstname.lastname@example.org
As simple as it gets, link and iframe tags were never implemented in LinkContentHandler. NUTCH-1233 kind of requires it.
Rely on Tika for outlink extraction
Upgrade to Tika 1.12
LinkContentHandler skips script tags
Add a ContentHandler for collecting links from parser output