Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
1.7.1, 2.0.0
-
None
-
None
Description
the Class org.apache.pdfbox.util.TextPosition offer just offer position of text in a page and limited Font info , (many chinese character not having FontDescriptor, so fontName and other style can not be retrieved. )
I think many people use PDFBox to build a client util to extract text and image,
and then reorginize the text and image to form a new article or book which will be read on ipad or mobile phone with the help of manual work to solve the layout ,
but many book which have complex laout and color has so many page make this work need much human effort, if more work can be done automatically, it can be efficient.
so ,if a Class named Text with precise position ,fontSize ,font style and color and other such as background color can easily getted.
the process of Text extraction also including exclude unnessary text, make text more colorful , can be easier.
Attachments
Issue Links
- duplicates
-
PDFBOX-2246 PDFTextStripper should handle colors
- Open
- is related to
-
PDFBOX-1736 I need urgently to extract text color from pdf file
- Closed