Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-521

Modified injector to allow newly injected CrawlDatum to overwrite original

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.9.0
    • None
    • injector
    • None
    • Tested on Solaris and Windows with Java 1.5

    Description

      Before this patch if a CrawlDatum is already in the crawldb then it will be used in preference to the CrawlDatum created by the newly injected url. This patch gives the user the ability to force the injected CrawlDatum to be used instead. The use case for this patch was the requirement for injected urls to jump to the top of the TopN list so that we can garuntee they will be crawled immediately (usefull for intranet crawling where changes can trigger injects).

      Attachments

        1. inject.patch
          1 kB
          Rob Young

        Activity

          People

            Unassigned Unassigned
            bubblenut Rob Young
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: