Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2275

MD5Signature by default doesn't take in account parse

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.11
    • 1.21
    • parser
    • None

    Description

      I'm testing Apache Nutch with the feed's plugin. I've noticed that for each page it generates the same digest/signature, therefore the dedup cleans everything up from the database.

      I'm wondering why the class MD5Signature is the default one instead of TextMD5Signature.

      Anyhow now I've modified a little bit the MD5Signature to let it work with the feed plugin

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            capponi.francesco@gmail.com Francesco Capponi

            Dates

              Created:
              Updated:

              Slack

                Issue deployment