Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
We have recently released anthelion, which is a focused crawler plugin for structured data which can be extracted with any23. (https://github.com/yahoo/anthelion) As proposed by Lewis (Lewis John McGibbney) we think the integration of the parser (any23) and the scoring function based on the online learner could be a good improvement for nutch.
Attachments
Issue Links
- links to