Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-836

Remove deprecated parse plugins

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1
    • nutchgora
    • parser
    • None
    • Patch Available

    Description

      Some of the parser plugins in 1.1 are covered by the parse-tika plugin. These plugins have been kept in 1.1 but should be removed from 2.0 where we'll rely on parse-tika almost exclusively. Some existing plugins might be kept when there is no equivalent in Tika (to be discussed). The following plugins are removed :

      • parse-html
      • parse-msexcel
      • parse-mspowerpoint
      • parse-msword
      • parse-pdf
      • parse-oo
      • parse-text
      • lib-jakarta-poi
      • lib-parsems

      The patch does not (yet) remove :

      • parse-ext
      • parse-js
      • parse-rss
      • parse-swf
      • parse-zip
      • feed

      Please review the patch and vote for its inclusion in the trunk.

      Attachments

        1. NUTCH-836-2.patch
          298 kB
          Julien Nioche

        Activity

          People

            jnioche Julien Nioche
            jnioche Julien Nioche
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: