Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When comparing files, it can be useful to digest the text contents so that users can identify files that may have duplicate content but different overall digests. Let's add a content digester to tika-eval's text stats calculator.
See: https://builds.apache.org/job/nutch-trunk/javadoc/org/apache/nutch/crawl/TextMD5Signature.html