Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2275

MD5Signature by default doesn't take in account parse

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.11
    • 1.21
    • parser
    • None

    Description

      I'm testing Apache Nutch with the feed's plugin. I've noticed that for each page it generates the same digest/signature, therefore the dedup cleans everything up from the database.

      I'm wondering why the class MD5Signature is the default one instead of TextMD5Signature.

      Anyhow now I've modified a little bit the MD5Signature to let it work with the feed plugin

      Attachments

        Activity

          People

            Unassigned Unassigned
            capponi.francesco@gmail.com Francesco Capponi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: