[PDFBOX-2272] Can't extract vertical text correctly - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.8.6, 2.0.0
Fix Version/s: None
Component/s: Text extraction
Labels:
None

Description

~~1.8.6 can't extract the Unicode due to failing to map the UCS2 CMap for 90ms-RKSJ-V.~~
2.0 extracts the text but can't handle the vertical layout

Also see the file from ~~PDFBOX-2294~~ which contains both horizontal and vertical text.

Attachments

test.txt
18/Aug/14 01:01
2 kB
Biligsaikhan Batjargal
test.pdf
18/Aug/14 01:01
54 kB
Biligsaikhan Batjargal
pdfbox_new_vertical_text_extraction.patch
23/Jul/15 11:10
6 kB
Andreas Meier

Issue Links

Add Link

is duplicated by

PDFBOX-2879 Wrong vertical text extraction for apache PDFBox 2.0.0

Closed

Delete this link

is related to

PDFBOX-800 Wrong text extract from vertical textboxes in pdf files

Open

Delete this link

relates to

PDFBOX-2711 Japanese text not extracted

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Biligsaikhan Batjargal

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 18/Aug/14 01:00

Updated:: 24/Jul/15 08:53

Agile

View on Board

Can't extract vertical text correctly

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment