[PDFBOX-2272] Can't extract vertical text correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.8.6, 2.0.0
Fix Version/s: None
Component/s: Text extraction
Labels:
None

Description

~~1.8.6 can't extract the Unicode due to failing to map the UCS2 CMap for 90ms-RKSJ-V.~~
2.0 extracts the text but can't handle the vertical layout

Also see the file from ~~PDFBOX-2294~~ which contains both horizontal and vertical text.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

test.txt
18/Aug/14 01:01
2 kB
Biligsaikhan Batjargal
test.pdf
18/Aug/14 01:01
54 kB
Biligsaikhan Batjargal
pdfbox_new_vertical_text_extraction.patch
23/Jul/15 11:10
6 kB
Andreas Meier

Issue Links

is duplicated by

PDFBOX-2879 Wrong vertical text extraction for apache PDFBox 2.0.0

Closed

is related to

PDFBOX-800 Wrong text extract from vertical textboxes in pdf files

Open

relates to

PDFBOX-2711 Japanese text not extracted

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Biligsaikhan Batjargal

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 18/Aug/14 01:00

Updated:: 24/Jul/15 08:53