Tika
  1. Tika
  2. TIKA-808

Fork Parser doesn't work for PDF files

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.1
    • Component/s: parser
    • Labels:
      None

      Description

      There seems to be something wrong with the fork parser and PDF files.

      If you run tika-app with the --text option against tika-parsers/src/test/resources/test-documents/testPDF.pdf then you get the text of the pdf back. However, with "-f --text" no text is returned (but you get no errors either)

        Activity

        Nick Burch created issue -
        Hide
        Nick Burch added a comment -

        I've added some unit tests in r1213131 for this case. I've had to disable them though, as they don't currently pass (they blow up on a class loader issue, and there may well be other problems too beyond that)

        Show
        Nick Burch added a comment - I've added some unit tests in r1213131 for this case. I've had to disable them though, as they don't currently pass (they blow up on a class loader issue, and there may well be other problems too beyond that)
        Hide
        Jerome Lacoste added a comment -

        I ve worked on getting the tests to pass. There were several problems, so I will have to open an issue per problem.

        Show
        Jerome Lacoste added a comment - I ve worked on getting the tests to pass. There were several problems, so I will have to open an issue per problem.
        Jerome Lacoste made changes -
        Field Original Value New Value
        Attachment 0001-TIKA-808-tika-doesn-t-parse-PDF-file.-The-issue-is-c.patch [ 12508518 ]
        Hide
        Jukka Zitting added a comment -

        Fixed in revision 1222886 by avoiding inner classes.

        Enabled the tests (thanks, Nick!) in revision 1222887.

        PS. Nick, do you indentation intentionally set at three spaces?

        Show
        Jukka Zitting added a comment - Fixed in revision 1222886 by avoiding inner classes. Enabled the tests (thanks, Nick!) in revision 1222887. PS. Nick, do you indentation intentionally set at three spaces?
        Jukka Zitting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Jukka Zitting [ jukkaz ]
        Fix Version/s 1.1 [ 12318849 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Nick Burch
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development