Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5

CJK decoding

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Resolution: Fixed
    • None
    • 1.7.0
    • Text extraction
    • None

    Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=765686
      Originally submitted by bguan on 2003-07-03 17:57.

      Another feature I need a lot is the correct interpretation
      of CJK encoding.

      Yes, I know PDF can be a pain when it comes to
      correctly interpreting CJK charsets, as many factors are
      involved, including whether a font (or its subset) is
      embeded or not.

      Attached is a simple Korean PDF that so far has not
      been correctly interpreted by any java based
      opensource libraries. Though it could be rendered
      correctly by XPDF on linux and also Windows.

      [attachment on SourceForge]
      http://sourceforge.net/tracker/download.php?group_id=78314&atid=552835&aid=765686&file_id=80181
      CJK.zip (), 142061 bytes
      CJK PDF, output and test program

      [comment on SourceForge]
      Originally sent by bguan.
      Logged In: YES
      user_id=815589

      Hello Ben,

      Thanks for the response. I just downloaded PDFBox 0.6.5 and
      wrote a little sample program to test it against 3 CJK PDF files
      I have, and the output is still no good. I have attached my
      sample program, the 3 PDFs and the output in the attached
      zip file.

      Can you tell me what I am foing wrong?

      The PDF files were generated by using Adobe Acrobat 5.0
      using embeded fonts I believe.

      Thank you.

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      There was no attachment with this. I have done some CJK
      work in the 0.6.5 release. Please attach the document and I
      can take a look at it.(Make sure you check the 'attach file'
      checkbox)

      Ben

      Attachments

        1. PDFBOX5-CJK.zip
          139 kB
          Andreas Lehmkühler

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              Anonymous Anonymous
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: