Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2032

Plugin to index the raw content of a readable document.

    XMLWordPrintableJSON

    Details

      Description

      This is related to https://issues.apache.org/jira/browse/NUTCH-1785 and
      https://issues.apache.org/jira/browse/NUTCH-1458

      We created a couple plugins to index the raw content of readable documents. If we include these plugins in the plugin chain we'll index the raw content of a readable document, i.e. XML, HTML, CSV, TXT etc. The index-rawcontent plugin is not designed to index binary files, however having the full content of an HTML/XML or a CSV document is really critical for some of us.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lewismc Lewis John McGibbney
                Reporter:
                betolink Luis Lopez
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: