-
Type:
Task
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 1.25
-
Component/s: None
-
Labels:None
When comparing files, it can be useful to digest the text contents so that users can identify files that may have duplicate content but different overall digests. Let's add a content digester to tika-eval's text stats calculator.
See: https://builds.apache.org/job/nutch-trunk/javadoc/org/apache/nutch/crawl/TextMD5Signature.html