Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2742

Unable to parse specific pdf file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 1.15
    • None
    • nutchNewbie, parser
    • None

    Description

      It appears that the Tika plugin is not parsing some PDF files.

      When I completed a dump of the segment data there is no content

      EDIT: See attached for output and crawl log

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Mark A M A
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: