Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2168

Parse-tika fails to retrieve parser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.1
    • 2.3.1
    • parser
    • None
    • Patch Available
    • Patch

    Description

      The plugin parse-tika fails to parse most (all?) kinds of document types (PDF, xlsx, ...) when run via ParserChecker or ParserJob:

      2015-11-12 19:14:30,903 INFO  parse.ParserJob - Parsing http://localhost/pdftest.pdf
      2015-11-12 19:14:30,905 INFO  parse.ParserFactory - ...
      2015-11-12 19:14:30,907 ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type application/pdf
      2015-11-12 19:14:30,913 WARN  parse.ParseUtil - Unable to successfully parse content http://localhost/pdftest.pdf of type application/pdf
      

      The same document is successfully parsed by TestPdfParser.

      Attachments

        1. NUTCH-2168.patch
          0.7 kB
          Sebastian Nagel

        Activity

          People

            Unassigned Unassigned
            snagel Sebastian Nagel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: