Description
I create a new MD5Signature that based on textual content. In our case we use boilerpipe to extract main text from content so this signature is more effective to deduplicate.
Attachments
Attachments
Issue Links
- depends upon
-
NUTCH-1686 Optimize UpdateDb to load less field from Store
- Closed