Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-49

Flag for generate to fetch only new pages to complement the -refetchonly flag

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • fetcher
    • None

    Description

      It would be useful, especially for research/testing purposes, to have a flag for the FetchListTool that make sure to only include URLs in the fetchlist that have not already been fetched (according to the information from the webdb that you're generating the fetchlist from).

      Attachments

        1. fetchnewonly.patch
          4 kB
          Luke Baker

        Activity

          People

            Unassigned Unassigned
            lukebaker Luke Baker
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: