Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2468

should filter out invalid URLs by default

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.12
    • 2.4, 1.14
    • bin
    • None

    Description

      Some Nutch components, by default, should reject invalid URLs. This was recently discussed in the users mailing list and has affected my work for a while. Although there may be some special-purpose needs to collect invalid URLs, they are not generally useful for crawling.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            xoffey@gmail.com Michael Coffey
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment