Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:

      Description

      I wrote sample standalone application with 1.6 version for pdf reading. Parser giving ??? characters particular PDF, few of other PDFs are working fine.
      Is there any problem with PDF file, but i have checked with other vendor parsers it is giving proper text.I am getting these ??? characters from PDFBox only.

      1. aaa1.pdf
        88 kB
        Ravi Kumar

        Activity

        John Hewson made changes -
        Component/s Text extraction [ 12312228 ]
        Component/s Parsing [ 12312226 ]
        Andreas Lehmkühler made changes -
        Labels ??? PDFBox textextraction
        Hide
        Ravi Kumar added a comment -

        Is there any solution

        Show
        Ravi Kumar added a comment - Is there any solution
        Hide
        Ravi Kumar added a comment -

        And if i use Tika parser, chinese CJK characters are coming, but PDF doesn't contain any CJK characters.

        Show
        Ravi Kumar added a comment - And if i use Tika parser, chinese CJK characters are coming, but PDF doesn't contain any CJK characters.
        Ravi Kumar made changes -
        Field Original Value New Value
        Attachment aaa1.pdf [ 12521812 ]
        Hide
        Ravi Kumar added a comment -

        This file header are coming proper english text but description is coming ?? characters.

        Show
        Ravi Kumar added a comment - This file header are coming proper english text but description is coming ?? characters.
        Ravi Kumar created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Ravi Kumar
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development