Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.10
-
None
Description
This is related to https://issues.apache.org/jira/browse/NUTCH-1785 and
https://issues.apache.org/jira/browse/NUTCH-1458
We created a couple plugins to index the raw content of readable documents. If we include these plugins in the plugin chain we'll index the raw content of a readable document, i.e. XML, HTML, CSV, TXT etc. The index-rawcontent plugin is not designed to index binary files, however having the full content of an HTML/XML or a CSV document is really critical for some of us.
Attachments
Issue Links
- duplicates
-
NUTCH-1785 Ability to index raw content
- Closed
- is duplicated by
-
NUTCH-1785 Ability to index raw content
- Closed