Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2202

Integration of Anthelion (Focused Crawling Module) into Nutch

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parser, scoring

    Description

      We have recently released anthelion, which is a focused crawler plugin for structured data which can be extracted with any23. (https://github.com/yahoo/anthelion) As proposed by Lewis (Lewis John McGibbney) we think the integration of the parser (any23) and the scoring function based on the online learner could be a good improvement for nutch.

      Attachments

        Issue Links

          Activity

            People

              lewismc Lewis John McGibbney
              robertmeusel Robert Meusel
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: