PDFBox
  1. PDFBox
  2. PDFBOX-1152

Gets scrambled japanese text while reading a PDF file

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 2.0.0
    • Component/s: Text extraction
    • Labels:
    • Environment:
      Windows XP Service Pack 3, P4, 1GB

      Description

      During conversion of a Japanese PDF file to XML the output Japanese text gets scrambled.

      1. SamplePDF.pdf
        8 kB
        Suresh Somanathan
      2. SamplePDF.xml
        0.1 kB
        Suresh Somanathan

        Activity

        Hide
        Andreas Lehmkühler added a comment -

        I've no idea how you created that xml output (AFAIK PDFBox doesn't provide any tool doing that), but what I know is that the text extraction works fine with the current trunk.

        Show
        Andreas Lehmkühler added a comment - I've no idea how you created that xml output (AFAIK PDFBox doesn't provide any tool doing that), but what I know is that the text extraction works fine with the current trunk.

          People

          • Assignee:
            Andreas Lehmkühler
            Reporter:
            Suresh Somanathan
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development