Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2032

Plugin to index the raw content of a readable document.

    XMLWordPrintableJSON

Details

    Description

      This is related to https://issues.apache.org/jira/browse/NUTCH-1785 and
      https://issues.apache.org/jira/browse/NUTCH-1458

      We created a couple plugins to index the raw content of readable documents. If we include these plugins in the plugin chain we'll index the raw content of a readable document, i.e. XML, HTML, CSV, TXT etc. The index-rawcontent plugin is not designed to index binary files, however having the full content of an HTML/XML or a CSV document is really critical for some of us.

      Attachments

        Issue Links

          Activity

            People

              lewismc Lewis John McGibbney
              betolink Luis Lopez
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: