Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:

      Description

      I wrote sample standalone application with 1.6 version for pdf reading. Parser giving ??? characters particular PDF, few of other PDFs are working fine.
      Is there any problem with PDF file, but i have checked with other vendor parsers it is giving proper text.I am getting these ??? characters from PDFBox only.

      1. aaa1.pdf
        88 kB
        Ravi Kumar

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Closed Closed
        794d 11h 50m 1 Tilman Hausherr 10/Jun/14 22:47
        Tilman Hausherr made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Implemented [ 10 ]
        Hide
        Tilman Hausherr added a comment -

        Whatever the problem was, it has been solved. The only "?" I get is for the (C), where Adobe reader returns nothing.

        Show
        Tilman Hausherr added a comment - Whatever the problem was, it has been solved. The only "?" I get is for the (C), where Adobe reader returns nothing.
        John Hewson made changes -
        Component/s Text extraction [ 12312228 ]
        Component/s Parsing [ 12312226 ]
        Andreas Lehmkühler made changes -
        Labels ??? PDFBox textextraction
        Hide
        Ravi Kumar added a comment -

        Is there any solution

        Show
        Ravi Kumar added a comment - Is there any solution
        Hide
        Ravi Kumar added a comment -

        And if i use Tika parser, chinese CJK characters are coming, but PDF doesn't contain any CJK characters.

        Show
        Ravi Kumar added a comment - And if i use Tika parser, chinese CJK characters are coming, but PDF doesn't contain any CJK characters.
        Ravi Kumar made changes -
        Field Original Value New Value
        Attachment aaa1.pdf [ 12521812 ]
        Hide
        Ravi Kumar added a comment -

        This file header are coming proper english text but description is coming ?? characters.

        Show
        Ravi Kumar added a comment - This file header are coming proper english text but description is coming ?? characters.
        Ravi Kumar created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Ravi Kumar
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development