Details

    • Type: New Feature
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: Text extraction
    • Labels:
      None

      Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=765686
      Originally submitted by bguan on 2003-07-03 17:57.

      Another feature I need a lot is the correct interpretation
      of CJK encoding.

      Yes, I know PDF can be a pain when it comes to
      correctly interpreting CJK charsets, as many factors are
      involved, including whether a font (or its subset) is
      embeded or not.

      Attached is a simple Korean PDF that so far has not
      been correctly interpreted by any java based
      opensource libraries. Though it could be rendered
      correctly by XPDF on linux and also Windows.

      [attachment on SourceForge]
      http://sourceforge.net/tracker/download.php?group_id=78314&atid=552835&aid=765686&file_id=80181
      CJK.zip (), 142061 bytes
      CJK PDF, output and test program

      [comment on SourceForge]
      Originally sent by bguan.
      Logged In: YES
      user_id=815589

      Hello Ben,

      Thanks for the response. I just downloaded PDFBox 0.6.5 and
      wrote a little sample program to test it against 3 CJK PDF files
      I have, and the output is still no good. I have attached my
      sample program, the 3 PDFs and the output in the attached
      zip file.

      Can you tell me what I am foing wrong?

      The PDF files were generated by using Adobe Acrobat 5.0
      using embeded fonts I believe.

      Thank you.

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      There was no attachment with this. I have done some CJK
      work in the 0.6.5 release. Please attach the document and I
      can take a look at it.(Make sure you check the 'attach file'
      checkbox)

      Ben

        Attachments

        1. PDFBOX5-CJK.zip
          139 kB
          Andreas Lehmkühler

          Issue Links

            Activity

              People

              • Assignee:
                lehmi Andreas Lehmkühler
                Reporter:
                Anonymous
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: